Server Admin Log/Archive 66

From Wikitech

2023-05-31

  • 21:00 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id everywhere (T299954) (duration: 07m 44s)
  • 20:57 Amir1: foreachwikiindblist group1 extensions/AbuseFilter/maintenance/MigrateActorsAF.php (T336224)
  • 20:55 Amir1: foreachwikiindblist group0 extensions/AbuseFilter/maintenance/MigrateActorsAF.php (T336224)
  • 20:54 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id everywhere (T299954) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:52 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id everywhere (T299954)
  • 20:40 urbanecm@deploy1002: Finished scap: Backport for Fix description link icon positioning (T329364) (duration: 12m 51s)
  • 20:30 mforns@deploy1002: Finished deploy [analytics/refinery@04c11e6] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04c11e6] (duration: 01m 29s)
  • 20:28 urbanecm@deploy1002: arlolra and urbanecm: Backport for Fix description link icon positioning (T329364) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:28 mforns@deploy1002: Started deploy [analytics/refinery@04c11e6] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04c11e6]
  • 20:28 mforns@deploy1002: Finished deploy [analytics/refinery@04c11e6] (thin): Regular analytics weekly train THIN [analytics/refinery@04c11e6] (duration: 00m 04s)
  • 20:28 mforns@deploy1002: Started deploy [analytics/refinery@04c11e6] (thin): Regular analytics weekly train THIN [analytics/refinery@04c11e6]
  • 20:27 mforns@deploy1002: Finished deploy [analytics/refinery@04c11e6]: Regular analytics weekly train [analytics/refinery@04c11e6] (duration: 05m 53s)
  • 20:27 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1004']
  • 20:27 urbanecm@deploy1002: Started scap: Backport for Fix description link icon positioning (T329364)
  • 20:26 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 20:26 urbanecm@deploy1002: Finished scap: Backport for Enable EditInSequence for beta-testing on napwikisource (T337472) (duration: 10m 09s)
  • 20:22 mforns@deploy1002: Started deploy [analytics/refinery@04c11e6]: Regular analytics weekly train [analytics/refinery@04c11e6]
  • 20:18 urbanecm@deploy1002: soda and urbanecm: Backport for Enable EditInSequence for beta-testing on napwikisource (T337472) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:17 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1006']
  • 20:16 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1005']
  • 20:16 urbanecm@deploy1002: Started scap: Backport for Enable EditInSequence for beta-testing on napwikisource (T337472)
  • 20:15 urbanecm@deploy1002: Finished scap: Backport for Enables ab test for multiple languages (T336969) (duration: 11m 56s)
  • 20:10 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns1004']
  • 20:09 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 20:07 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1006']
  • 20:07 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1005']
  • 20:07 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1004']
  • 20:05 urbanecm@deploy1002: ksarabia and urbanecm: Backport for Enables ab test for multiple languages (T336969) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:03 urbanecm@deploy1002: Started scap: Backport for Enables ab test for multiple languages (T336969)
  • 19:57 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 19:55 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns1004']
  • 19:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 19:55 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1004']
  • 19:54 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 19:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:20 JustHann_: we started swapping in dumpsdata1006 as primary nfs dumps server, replacing dumpsdata1005 at 16:55 UTC and completed at 19:09 UTC
  • 19:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host dns1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:18 robh@cumin1001: START - Cookbook sre.hosts.provision for host dns1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 robh@cumin1001: START - Cookbook sre.hosts.provision for host dns1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:16 robh@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['dns1004']
  • 19:16 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 19:14 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns1006']
  • 19:14 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns1005']
  • 19:14 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns1004']
  • 19:13 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1006']
  • 19:13 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1005']
  • 19:13 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1004']
  • 19:12 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1006
  • 19:11 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dns1006
  • 19:11 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1005
  • 19:10 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dns1005
  • 19:09 JustHann_: swapping in dumpsdata1006 as primary nfs dumps server, replacing dumpsdata1005 now completed!
  • 18:59 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1004
  • 18:58 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dns1004
  • 18:57 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:57 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new dns100[345] - robh@cumin1001"
  • 18:56 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new dns100[345] - robh@cumin1001"
  • 18:53 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:40 dduvall@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.11 refs T337525 (duration: 06m 02s)
  • 18:38 mforns@deploy1002: Finished deploy [airflow-dags/analytics_product@b3eb622]: (no justification provided) (duration: 00m 07s)
  • 18:38 mforns@deploy1002: Started deploy [airflow-dags/analytics_product@b3eb622]: (no justification provided)
  • 18:34 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.11 refs T337525
  • 18:01 ran `wikiadmin2023@10.64.32.139(huwiki)> UPDATE thread SET thread_signature = '<span title="bĂ©taverziĂł"> <!--<font style="text-decoration: blink;">--><font color="red">♄</font><font color="white">♄</font><font color="green">♄</font> </font> [[User:Gubbubu|<font color="green" face="Lucida calligraphy">Î“ÎżÏ…ÎČÎČÎżÏ‚ ΘÎčλο' WHERE thread_id = 1288;` (with `BEGIN`/`COMMIT`) for T337700
  • 17:31 ladsgroup@deploy1002: Backport cancelled.
  • 17:30 ladsgroup@deploy1002: Finished scap: Backport for Remove legacy encoding option from dawiktionary (T128155) (duration: 12m 54s)
  • 17:18 ladsgroup@deploy1002: ladsgroup: Backport for Remove legacy encoding option from dawiktionary (T128155) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 17:17 ladsgroup@deploy1002: Started scap: Backport for Remove legacy encoding option from dawiktionary (T128155)
  • 17:10 brett@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937 (duration: 11m 07s)
  • 17:10 brett: Maglev LVS scheduler rollout in codfw finished - T263797
  • 17:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on P{cp[2037,2039,2041].codfw.wmnet} and A:cp
  • 16:59 brett@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937
  • 16:37 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@5379d83]: (no justification provided) (duration: 00m 34s)
  • 16:37 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@5379d83]: (no justification provided)
  • 16:22 elukey: `systemctl reset-failed session-c6111.scope session-c7230.scope` on stat1005 to clear old alerts
  • 16:20 vgutierrez@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on P{cp[2037,2039,2041].codfw.wmnet} and A:cp
  • 16:13 vgutierrez: repool cp2035 - T337247 T323557
  • 16:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2035.codfw.wmnet
  • 16:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp2035.codfw.wmnet
  • 16:10 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:10 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:08 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:08 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:04 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:04 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:51 Emperor: swift delete virtual machines from "swift" WMCS project
  • 15:51 brett@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937
  • 15:50 brett@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937 (duration: 02m 24s)
  • 15:48 brett@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937
  • 15:47 Emperor: delete virtual machines from "swift" WMCS project
  • 15:45 vgutierrez: cp2035 depooled as puppet is unable to run due to ipmi issues - T337247
  • 15:42 brett: Maglev LVS scheduler rollout began IN PROGRESS, not finished - T263797
  • 15:42 brett: Maglev LVS scheduler rollout finished in codfw - T263797
  • 15:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: ipmi/mgmt console issues
  • 15:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2035.codfw.wmnet with reason: ipmi/mgmt console issues
  • 15:39 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=1) rolling custom on A:cp-text_codfw
  • 14:55 klausman@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:55 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:54 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:54 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:50 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:50 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:44 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:43 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:41 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 14:27 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:27 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:25 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:25 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:25 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:14 urbanecm@deploy1002: Finished scap: Backport for NewImpact: Cache empty user impact on account creation (T337320), Personalized praise: Fix first-ever notifications (T322452) (duration: 07m 26s)
  • 14:08 urbanecm@deploy1002: urbanecm: Backport for NewImpact: Cache empty user impact on account creation (T337320), Personalized praise: Fix first-ever notifications (T322452) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:07 urbanecm@deploy1002: Started scap: Backport for NewImpact: Cache empty user impact on account creation (T337320), Personalized praise: Fix first-ever notifications (T322452)
  • 14:02 urbanecm@deploy1002: Finished scap: Backport for Personalized praise: Fix first-ever notifications (T322452), DeleteAction: Replace remaining OOUI fields (T337809) (duration: 11m 11s)
  • 14:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T329049)
  • 13:58 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:57 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:56 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:56 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:55 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:55 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:55 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:55 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:54 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:53 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:53 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:53 urbanecm@deploy1002: daimona and urbanecm: Backport for Personalized praise: Fix first-ever notifications (T322452), DeleteAction: Replace remaining OOUI fields (T337809) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:52 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:51 urbanecm@deploy1002: Started scap: Backport for Personalized praise: Fix first-ever notifications (T322452), DeleteAction: Replace remaining OOUI fields (T337809)
  • 13:46 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T329049)
  • 13:44 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:44 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:42 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:41 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:41 urbanecm@deploy1002: Finished scap: Backport for Enable wgMinervaEnableSiteNotice for bnwikiquote (T337683) (duration: 10m 01s)
  • 13:39 ottomata: destroy mw-page-content-change-enrich deployment in dse-k8s-eqiad in order to deploy in wikikube - T330507
  • 13:35 godog: rm cadvisor.service symlink/alias and restart kubelet on affected hosts - T337836
  • 13:33 urbanecm@deploy1002: mdsshakil and urbanecm: Backport for Enable wgMinervaEnableSiteNotice for bnwikiquote (T337683) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:31 urbanecm@deploy1002: Started scap: Backport for Enable wgMinervaEnableSiteNotice for bnwikiquote (T337683)
  • 13:29 vgutierrez@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_codfw
  • 13:28 urbanecm@deploy1002: Finished scap: Backport for [Growth] Enable new Impact for 10 additional wikis (T336203) (duration: 08m 13s)
  • 13:21 urbanecm@deploy1002: urbanecm: Backport for [Growth] Enable new Impact for 10 additional wikis (T336203) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:20 urbanecm@deploy1002: Started scap: Backport for [Growth] Enable new Impact for 10 additional wikis (T336203)
  • 13:19 urbanecm@deploy1002: Finished scap: Backport for Revert "Revert "Switch VisualEditor to not use RESTbase on small and medium wikis"" (duration: 15m 10s)
  • 13:17 volans: uploaded spicerack_7.2.0 to apt.wikimedia.org bullseye-wikimedia
  • 13:15 mforns@deploy1002: Finished deploy [airflow-dags/analytics_product@5a38fbf]: (no justification provided) (duration: 00m 06s)
  • 13:15 mforns@deploy1002: Started deploy [airflow-dags/analytics_product@5a38fbf]: (no justification provided)
  • 13:06 urbanecm@deploy1002: urbanecm and d3r1ck01: Backport for Revert "Revert "Switch VisualEditor to not use RESTbase on small and medium wikis"" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:04 urbanecm@deploy1002: Started scap: Backport for Revert "Revert "Switch VisualEditor to not use RESTbase on small and medium wikis""
  • 12:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on P{cp[2028,2030,2032,2034,2036,2038,2040].codfw.wmnet} and A:cp
  • 12:03 jayme: re-enabling puppet on all kubernetes hosts
  • 12:00 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:59 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:54 jayme: disabled puppet on all kubernetes hosts apart from staging-codfw for https://gerrit.wikimedia.org/r/c/operations/puppet/+/924905
  • 11:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T329049)
  • 11:20 apergos: rebooted dumpsdata1006 manually after seeral timeouts trying to use the cookbook; in the end, forced to powercycle the host via mgmt console
  • 11:18 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T329049)
  • 11:14 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc2001.codfw.wmnet
  • 11:14 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:14 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 11:12 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 11:07 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 10:34 ladsgroup@deploy1002: Finished scap: Backport for mwscript: Avoid prepending maintenance/ if >= 2 dots in argument (T336819) (duration: 08m 50s)
  • 10:34 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts doc2001.codfw.wmnet
  • 10:27 ladsgroup@deploy1002: ladsgroup: Backport for mwscript: Avoid prepending maintenance/ if >= 2 dots in argument (T336819) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 10:25 ladsgroup@deploy1002: Started scap: Backport for mwscript: Avoid prepending maintenance/ if >= 2 dots in argument (T336819)
  • 10:21 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1006.eqiad.wmnet
  • 10:20 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
  • 10:17 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1006.eqiad.wmnet
  • 10:16 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
  • 10:12 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1006.eqiad.wmnet
  • 10:12 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
  • 10:11 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc1002.eqiad.wmnet
  • 10:11 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:11 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 10:08 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 10:07 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dumpsdata1006.eqiad.wmnet
  • 10:07 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1006.eqiad.wmnet
  • 10:01 vgutierrez@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on P{cp[2028,2030,2032,2034,2036,2038,2040].codfw.wmnet} and A:cp
  • 10:01 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 09:57 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts doc1002.eqiad.wmnet
  • 09:56 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2004-dev.wikimedia.org
  • 09:56 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:56 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 09:49 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48665 and previous config saved to /var/cache/conftool/dbconfig/20230531-093659-root.json
  • 09:36 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:35 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 09:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on P{cp2042.codfw.wmnet} and A:cp
  • 09:23 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2004-dev.wikimedia.org
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48663 and previous config saved to /var/cache/conftool/dbconfig/20230531-092154-root.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48662 and previous config saved to /var/cache/conftool/dbconfig/20230531-090649-root.json
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: bast2002.wikimedia.org
  • 09:01 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: bast2002.wikimedia.org
  • 09:00 vgutierrez@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on P{cp2042.codfw.wmnet} and A:cp
  • 08:59 fabfur: Testing new cookbook to switch port 80 from Varnish to HAProxy on cp2042
  • 08:58 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1009.eqiad.wmnet
  • 08:58 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1009.eqiad.wmnet
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: labstore1005.eqiad.wmnet
  • 08:55 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: labstore1005.eqiad.wmnet
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: labstore1004.eqiad.wmnet
  • 08:55 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: labstore1004.eqiad.wmnet
  • 08:52 moritzm: manually run puppet node clean/deactivate for labstore1004/1005 (which run into a traceback in the decom script) T337269
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48661 and previous config saved to /var/cache/conftool/dbconfig/20230531-085145-root.json
  • 08:41 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1009.eqiad.wmnet with OS bullseye
  • 08:39 kostajh: UTC morning deploys done
  • 08:37 kharlan@deploy1002: Finished scap: Backport for NewImpact: Cache empty user impact on account creation (T337320) (duration: 13m 48s)
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48660 and previous config saved to /var/cache/conftool/dbconfig/20230531-083640-root.json
  • 08:25 kharlan@deploy1002: kharlan: Backport for NewImpact: Cache empty user impact on account creation (T337320) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:24 kharlan@deploy1002: Started scap: Backport for NewImpact: Cache empty user impact on account creation (T337320)
  • 08:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1009.eqiad.wmnet with reason: host reimage
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48659 and previous config saved to /var/cache/conftool/dbconfig/20230531-082135-root.json
  • 08:20 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1009.eqiad.wmnet with reason: host reimage
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48658 and previous config saved to /var/cache/conftool/dbconfig/20230531-080631-root.json
  • 08:04 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1009.eqiad.wmnet with OS bullseye
  • 07:58 legoktm@deploy1002: Finished scap: Backport for Remove GWToolset configuration (2/2) (T270911) (duration: 38m 58s)
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48657 and previous config saved to /var/cache/conftool/dbconfig/20230531-075126-root.json
  • 07:41 legoktm@deploy1002: legoktm: Backport for Remove GWToolset configuration (2/2) (T270911) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:35 godog: apache2 restarted on logstash1032 before I could get a backtrace to debug logstash lag
  • 07:19 legoktm@deploy1002: Started scap: Backport for Remove GWToolset configuration (2/2) (T270911)
  • 07:17 legoktm@deploy1002: Finished scap: Backport for Remove GWToolset configuration (1/2) (T270911) (duration: 09m 51s)
  • 07:09 legoktm@deploy1002: legoktm: Backport for Remove GWToolset configuration (1/2) (T270911) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:07 legoktm@deploy1002: Started scap: Backport for Remove GWToolset configuration (1/2) (T270911)
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48656 and previous config saved to /var/cache/conftool/dbconfig/20230531-070730-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48655 and previous config saved to /var/cache/conftool/dbconfig/20230531-065225-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48654 and previous config saved to /var/cache/conftool/dbconfig/20230531-064327-root.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48653 and previous config saved to /var/cache/conftool/dbconfig/20230531-063721-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48652 and previous config saved to /var/cache/conftool/dbconfig/20230531-062823-root.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48651 and previous config saved to /var/cache/conftool/dbconfig/20230531-062216-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48650 and previous config saved to /var/cache/conftool/dbconfig/20230531-061318-root.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48649 and previous config saved to /var/cache/conftool/dbconfig/20230531-060710-root.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48648 and previous config saved to /var/cache/conftool/dbconfig/20230531-055813-root.json
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48647 and previous config saved to /var/cache/conftool/dbconfig/20230531-055205-root.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48646 and previous config saved to /var/cache/conftool/dbconfig/20230531-054308-root.json
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48645 and previous config saved to /var/cache/conftool/dbconfig/20230531-053700-root.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48644 and previous config saved to /var/cache/conftool/dbconfig/20230531-052804-root.json
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48643 and previous config saved to /var/cache/conftool/dbconfig/20230531-052156-root.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48642 and previous config saved to /var/cache/conftool/dbconfig/20230531-051259-root.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1221 (sanitarium s4 master) T337446', diff saved to https://phabricator.wikimedia.org/P48640 and previous config saved to /var/cache/conftool/dbconfig/20230531-045927-root.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48639 and previous config saved to /var/cache/conftool/dbconfig/20230531-045754-root.json
  • 02:19 eileen: civicrm upgraded from 5905a403 to 885208ca

2023-05-30

  • 23:38 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id in group1 wikis (T299954) (duration: 08m 00s)
  • 23:31 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id in group1 wikis (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 23:30 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id in group1 wikis (T299954)
  • 22:22 ejegg: civicrm upgraded from 415aa7e5 to 5905a403
  • 21:56 samtar@deploy1002: Finished scap: Backport for linker: Check for null parser in Linker::makeThumbLink2 (T337794) (duration: 07m 48s)
  • 21:50 samtar@deploy1002: jforrester and samtar: Backport for linker: Check for null parser in Linker::makeThumbLink2 (T337794) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:48 samtar@deploy1002: Started scap: Backport for linker: Check for null parser in Linker::makeThumbLink2 (T337794)
  • 20:58 ladsgroup@deploy1002: ladsgroup: Backport for Add WANCache to ParserOutputPageProperties::finalize (T336698) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:57 ladsgroup@deploy1002: Started scap: Backport for Add WANCache to ParserOutputPageProperties::finalize (T336698)
  • 20:40 ladsgroup@deploy1002: Finished scap: Backport for Add WANCache to ParserOutputPageProperties::finalize (T336698) (duration: 09m 27s)
  • 20:32 ladsgroup@deploy1002: ladsgroup: Backport for Add WANCache to ParserOutputPageProperties::finalize (T336698) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:30 ladsgroup@deploy1002: Started scap: Backport for Add WANCache to ParserOutputPageProperties::finalize (T336698)
  • 20:12 inflatador: bking@wdqs2009 depool wdqs2009 until it catches up with lag
  • 20:10 samtar@deploy1002: Finished scap: Backport for Turn on A/B Test Hebrew (T336969) (duration: 08m 46s)
  • 20:03 samtar@deploy1002: ksarabia and samtar: Backport for Turn on A/B Test Hebrew (T336969) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:01 samtar@deploy1002: Started scap: Backport for Turn on A/B Test Hebrew (T336969)
  • 19:48 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. T335305. (duration: 00m 09s)
  • 19:48 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@cd667c2]: Deplot Iceberg version of referrer_daily on analytics Airflow instance. T335305.
  • 19:36 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 04m 02s)
  • 19:32 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 19:29 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 54s)
  • 19:29 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 19:29 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 16m 36s)
  • 19:24 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
  • 19:12 inflatador: [WDQS Deploy] Deploying version 0.3.124
  • 19:11 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 18:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.11 refs T337525
  • 17:45 mutante: re-enabling puppet on contint2001
  • 16:20 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_growthexperiments-userImpactUpdateRecentlyEdited
  • 16:19 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_growthexperiments-userImpactUpdateRecentlyRegistered
  • 16:14 urbanecm@deploy1002: Finished scap: Backport for [Growth] Enable user impact refresh on 10 more wikis (T336203) (duration: 07m 08s)
  • 16:07 urbanecm@deploy1002: Started scap: Backport for [Growth] Enable user impact refresh on 10 more wikis (T336203)
  • 16:00 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:00 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:58 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:58 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:57 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:56 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:56 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:55 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:54 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:54 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:54 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:53 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog2002.codfw.wmnet with OS bullseye
  • 15:51 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:51 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:49 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:49 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:15 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
  • 15:15 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
  • 15:14 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
  • 15:10 tgr_: UTC evening deploys done
  • 15:08 tgr@deploy1002: Finished scap: Backport for ve.ui.MWGalleryDialog: Fix showing the search panel (T337638), Hide 'editnotice-notext' message in VE (and mobile apps) (T337633), ve.ui.MWGalleryDialog: Fix showing the search panel (T337638) (duration: 08m 08s)
  • 15:05 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:03 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:02 tgr@deploy1002: tgr and matmarex: Backport for ve.ui.MWGalleryDialog: Fix showing the search panel (T337638), Hide 'editnotice-notext' message in VE (and mobile apps) (T337633), ve.ui.MWGalleryDialog: Fix showing the search panel (T337638) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 15:00 tgr@deploy1002: Started scap: Backport for ve.ui.MWGalleryDialog: Fix showing the search panel (T337638), Hide 'editnotice-notext' message in VE (and mobile apps) (T337633), ve.ui.MWGalleryDialog: Fix showing the search panel (T337638)
  • 14:50 moritzm: installing texlive-bin security updates
  • 14:49 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog2002.codfw.wmnet with reason: host reimage
  • 14:46 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog2002.codfw.wmnet with reason: host reimage
  • 14:36 tgr@deploy1002: Finished scap: Backport for Hide 'editnotice-notext' message in VE (and mobile apps) (T337633) (duration: 08m 01s)
  • 14:29 tgr@deploy1002: matmarex and tgr: Backport for Hide 'editnotice-notext' message in VE (and mobile apps) (T337633) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:28 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog2002.codfw.wmnet with OS bullseye
  • 14:27 tgr@deploy1002: Started scap: Backport for Hide 'editnotice-notext' message in VE (and mobile apps) (T337633)
  • 14:16 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mwlog2002.codfw.wmnet with OS bullseye
  • 14:16 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetdb1003.eqiad.wmnet with OS bookworm
  • 14:14 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:13 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:08 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:06 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:06 moritzm: installing libwebp security updates
  • 14:06 tgr@deploy1002: Finished scap: Backport for editpage: Change the order of hooks slightly for FlaggedRevs (T337637) (duration: 08m 14s)
  • 13:59 tgr@deploy1002: tgr and matmarex: Backport for editpage: Change the order of hooks slightly for FlaggedRevs (T337637) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:58 tgr@deploy1002: Started scap: Backport for editpage: Change the order of hooks slightly for FlaggedRevs (T337637)
  • 13:57 tgr@deploy1002: Finished scap: Backport for prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088) (duration: 16m 13s)
  • 13:56 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2009.codfw.wmnet
  • 13:55 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2009.codfw.wmnet
  • 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS bullseye
  • 13:42 tgr@deploy1002: tgr and daimona: Backport for prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:40 tgr@deploy1002: Started scap: Backport for prod: Remove $wgCampaignEventsEnableMultipleOrganizers (T334088)
  • 13:33 mlitn@deploy1002: Finished scap: Backport for Fix maxJobs default, Fix maxJobs default (duration: 07m 39s)
  • 13:27 mlitn@deploy1002: mlitn: Backport for Fix maxJobs default, Fix maxJobs default synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:25 mlitn@deploy1002: Started scap: Backport for Fix maxJobs default, Fix maxJobs default
  • 13:20 tgr@deploy1002: Finished scap: Backport for GrowthExperiments: Re-add $wgGERestbaseUrl (duration: 09m 26s)
  • 13:13 tgr@deploy1002: tgr: Backport for GrowthExperiments: Re-add $wgGERestbaseUrl synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:11 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog2002.codfw.wmnet with OS bullseye
  • 13:11 tgr@deploy1002: Started scap: Backport for GrowthExperiments: Re-add $wgGERestbaseUrl
  • 13:09 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 13:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
  • 13:09 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 13:09 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:08 bblack: lvs1018: restart pybal for wikireplicas monitoring removal
  • 13:08 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 13:06 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:06 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2009.codfw.wmnet with reason: host reimage
  • 13:06 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:03 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 13:00 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 12:51 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:51 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:48 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS bullseye
  • 12:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:29 volans: disablig puppet where cadvisor is present
  • 12:14 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye
  • 11:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
  • 11:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:51 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:51 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for moved cloudcontrol2005-dev - cmooney@cumin1001"
  • 11:50 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for moved cloudcontrol2005-dev - cmooney@cumin1001"
  • 11:50 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 11:47 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:46 slyngshede@cumin1001: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
  • 11:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
  • 11:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on puppetboard2003.codfw.wmnet,puppetboard1003.eqiad.wmnet with reason: building_systems
  • 11:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:41 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:14 hashar@deploy1002: Finished deploy [gerrit/gerrit@6deabc9]: wm-checks-api: add support for DUCT - T331651 (duration: 00m 08s)
  • 11:14 hashar@deploy1002: Started deploy [gerrit/gerrit@6deabc9]: wm-checks-api: add support for DUCT - T331651
  • 11:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
  • 11:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2006.codfw.wmnet with OS bookworm
  • 11:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 10:57 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
  • 10:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 10:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 10:50 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
  • 10:41 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:41 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:11 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetboard2003.codfw.wmnet with OS bookworm
  • 10:11 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetboard1003.eqiad.wmnet with OS bookworm
  • 10:00 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id in group0 wikis (T299954) (duration: 08m 12s)
  • 09:59 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host testvm2006.codfw.wmnet with OS bookworm
  • 09:58 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
  • 09:57 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb1003.eqiad.wmnet with reason: host reimage
  • 09:54 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id in group0 wikis (T299954) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 09:52 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id in group0 wikis (T299954)
  • 09:52 zabe@deploy1002: Finished scap: Backport for Check for null when using ::getCheckUserHelperFieldset (T337599) (duration: 09m 52s)
  • 09:49 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard2003.codfw.wmnet with reason: host reimage
  • 09:46 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard2003.codfw.wmnet with reason: host reimage
  • 09:43 zabe@deploy1002: zabe: Backport for Check for null when using ::getCheckUserHelperFieldset (T337599) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 09:43 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb1003.eqiad.wmnet with OS bookworm
  • 09:42 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:42 zabe@deploy1002: Started scap: Backport for Check for null when using ::getCheckUserHelperFieldset (T337599)
  • 09:40 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 09:40 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 09:37 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id in test wikis (T299954) (duration: 07m 48s)
  • 09:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
  • 09:33 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
  • 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 09:33 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:33 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:32 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetboard1003.eqiad.wmnet with reason: host reimage
  • 09:30 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id in test wikis (T299954) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:30 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 09:30 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetdb2003.codfw.wmnet with OS bookworm
  • 09:29 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id in test wikis (T299954)
  • 09:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:24 tgr@deploy1002: Finished scap: Backport for Improve handling of missing image recommendation (duration: 08m 57s)
  • 09:22 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
  • 09:20 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
  • 09:19 arturo: run aborrero@cumin1001:~ 2s 98 $ sudo cumin "P{R:Profile::Mariadb::Section = 's7'} and P{P:wmcs::db::wikireplicas::mariadb_multiinstance}" "/usr/local/sbin/maintain-meta_p --all-databases --bootstrap"
  • 09:17 tgr@deploy1002: tgr: Backport for Improve handling of missing image recommendation synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 09:15 tgr@deploy1002: Started scap: Backport for Improve handling of missing image recommendation
  • 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 09:14 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:14 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:13 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 09:11 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 09:11 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 09:06 tgr@deploy1002: Finished scap: Backport for Section images: Do not treat unexpected kinds as production errors (duration: 14m 22s)
  • 09:00 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
  • 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 09:00 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:00 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:59 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:54 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:53 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:53 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:53 tgr@deploy1002: tgr: Backport for Section images: Do not treat unexpected kinds as production errors synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:52 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:51 tgr@deploy1002: Started scap: Backport for Section images: Do not treat unexpected kinds as production errors
  • 08:50 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:50 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 08:49 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
  • 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:49 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:48 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:44 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:44 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:44 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:43 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:41 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:41 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 08:39 tgr@deploy1002: Finished scap: Backport for Improve logging of invalid image recommendation kinds (duration: 10m 30s)
  • 08:39 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
  • 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:39 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:38 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:36 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:36 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:35 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:35 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:34 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:33 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:31 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:31 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 08:30 tgr@deploy1002: tgr: Backport for Improve logging of invalid image recommendation kinds synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:29 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
  • 08:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:29 tgr@deploy1002: Started scap: Backport for Improve logging of invalid image recommendation kinds
  • 08:29 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:27 jayme: re-enable puppet on P:kubernetes::node for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909687
  • 08:25 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:20 jayme: disable puppet on P:kubernetes::node (apart from staging-codfw) for https://gerrit.wikimedia.org/r/c/operations/puppet/+/909687
  • 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:15 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:15 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:14 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 08:12 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:12 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetdb2003.codfw.wmnet with reason: host reimage
  • 08:08 tgr@deploy1002: Finished scap: Backport for Section images: Accept more recommendation types (duration: 07m 51s)
  • 08:01 tgr@deploy1002: tgr: Backport for Section images: Accept more recommendation types synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:00 tgr@deploy1002: Started scap: Backport for Section images: Accept more recommendation types
  • 07:56 ladsgroup@deploy1002: Finished scap: Backport for Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634) (duration: 09m 17s)
  • 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetdb2003.codfw.wmnet with OS bookworm
  • 07:48 ladsgroup@deploy1002: func and ladsgroup: Backport for Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:46 ladsgroup@deploy1002: Started scap: Backport for Revert "Rename wgPageContentLanguage to wgPageViewLanguage" partially (T337634)
  • 07:45 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
  • 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:45 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:45 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48633 and previous config saved to /var/cache/conftool/dbconfig/20230530-074445-root.json
  • 07:44 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:42 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:41 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:40 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:38 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 07:38 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 07:31 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
  • 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:31 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:30 moritzm: move LDAP permissions for hghani from cn=nda to cn=wmf T322145
  • 07:30 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48632 and previous config saved to /var/cache/conftool/dbconfig/20230530-072941-root.json
  • 07:29 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 9 Wikipedia (T337290) (duration: 09m 38s)
  • 07:28 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 07:28 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:27 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:21 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 9 Wikipedia (T337290) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:19 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 9 Wikipedia (T337290)
  • 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:17 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:17 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:16 kartik@deploy1002: Finished scap: Backport for Undeploy Special:Contribute from unsupported skins (T337366) (duration: 11m 49s)
  • 07:16 moritzm: update bookworm installer to rc4 T330495
  • 07:16 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48630 and previous config saved to /var/cache/conftool/dbconfig/20230530-071436-root.json
  • 07:10 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 07:10 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 07:10 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2006.codfw.wmnet
  • 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:10 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:09 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:07 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:07 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:07 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:06 kartik@deploy1002: kartik: Backport for Undeploy Special:Contribute from unsupported skins (T337366) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 07:06 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:04 kartik@deploy1002: Started scap: Backport for Undeploy Special:Contribute from unsupported skins (T337366)
  • 07:04 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 07:03 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 07:02 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host testvm2006.codfw.wmnet
  • 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 07:02 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:02 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 07:01 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48629 and previous config saved to /var/cache/conftool/dbconfig/20230530-065932-root.json
  • 06:58 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 06:58 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 06:57 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 06:51 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 06:50 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - slyngshede@cumin1001"
  • 06:48 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 06:48 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48628 and previous config saved to /var/cache/conftool/dbconfig/20230530-064427-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48625 and previous config saved to /var/cache/conftool/dbconfig/20230530-062922-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48624 and previous config saved to /var/cache/conftool/dbconfig/20230530-061417-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48623 and previous config saved to /var/cache/conftool/dbconfig/20230530-055913-root.json
  • 05:43 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 62597
  • 05:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62597
  • 05:41 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nray out of all services on: 1255 hosts
  • 05:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nray out of all services on: 1255 hosts
  • 05:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nray out of all services on: 784 hosts
  • 05:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nray out of all services on: 784 hosts
  • 05:28 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hxi-ctr out of all services on: 784 hosts
  • 05:27 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hxi-ctr out of all services on: 784 hosts
  • 05:26 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hxi-ctr out of all services on: 1255 hosts
  • 05:25 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hxi-ctr out of all services on: 1255 hosts
  • 05:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62597
  • 05:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62597
  • 04:28 kart_: Updated cxserver to 2023-05-29-112644-production (T337657)
  • 04:28 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 04:27 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 04:24 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 04:24 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 04:21 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 04:20 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.9 (duration: 02m 10s)
  • 03:52 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.11 refs T337525 (duration: 49m 54s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.11 refs T337525

2023-05-29

  • 15:19 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on
  • 15:19 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: This is being worked on
  • 14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
  • 14:18 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
  • 13:57 vgutierrez@puppetmaster1001: conftool action : set/weight=10; selector: name=dbproxy.*,dc=eqiad
  • 11:25 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:24 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48618 and previous config saved to /var/cache/conftool/dbconfig/20230529-112242-root.json
  • 11:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:13 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48617 and previous config saved to /var/cache/conftool/dbconfig/20230529-110737-root.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48616 and previous config saved to /var/cache/conftool/dbconfig/20230529-105233-root.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48615 and previous config saved to /var/cache/conftool/dbconfig/20230529-103728-root.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48614 and previous config saved to /var/cache/conftool/dbconfig/20230529-102223-root.json
  • 10:07 vgutierrez: restarting pybal on lvs1018
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48612 and previous config saved to /var/cache/conftool/dbconfig/20230529-100719-root.json
  • 10:05 oblivian@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:05 oblivian@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:05 oblivian@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 10:05 oblivian@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:04 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:04 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:03 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:03 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 10:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:00 vgutierrez: restarting pybal on lvs1020
  • 09:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 09:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 09:56 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 09:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48611 and previous config saved to /var/cache/conftool/dbconfig/20230529-095214-root.json
  • 09:52 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 09:51 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 09:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 09:49 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 09:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 09:45 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48610 and previous config saved to /var/cache/conftool/dbconfig/20230529-093709-root.json
  • 09:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:31 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:30 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:29 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:13 godog: start partial rollout of cadvisor to eqiad/codfw (~10%) T108027
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48609 and previous config saved to /var/cache/conftool/dbconfig/20230529-090216-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48608 and previous config saved to /var/cache/conftool/dbconfig/20230529-084711-root.json
  • 08:45 godog: delete old raw blocks from thanos - T337236
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48607 and previous config saved to /var/cache/conftool/dbconfig/20230529-083206-root.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48606 and previous config saved to /var/cache/conftool/dbconfig/20230529-081702-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48604 and previous config saved to /var/cache/conftool/dbconfig/20230529-080157-root.json
  • 07:57 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:56 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48603 and previous config saved to /var/cache/conftool/dbconfig/20230529-074653-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48602 and previous config saved to /var/cache/conftool/dbconfig/20230529-073148-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48601 and previous config saved to /var/cache/conftool/dbconfig/20230529-071643-root.json
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s2, s3, s5 T337446', diff saved to https://phabricator.wikimedia.org/P48598 and previous config saved to /var/cache/conftool/dbconfig/20230529-051043-root.json

2023-05-28

  • 13:19 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 13:17 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 13:16 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 13:16 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 06:12 marostegui: Change innodb_fast_shutdown to 0 on db1154 before downgrading T337446

2023-05-27

  • 21:40 Amir1: insert into templatelinks (tl_from, tl_from_namespace, tl_target_id) values (686, 0, 199); on db1154:3113 (T337446)
  • 17:42 godog: silence systemd state alert flapping on stat1009 until monday
  • 00:03 tzatziki: removing 1 file for legal compliance

2023-05-26

  • 23:48 tzatziki: removing 2 files for legal compliance
  • 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:47 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:47 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:24 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:15 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:15 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:26 demon@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.10 refs T330216
  • 17:38 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10 refs T330216 (duration: 06m 10s)
  • 17:31 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10 refs T330216
  • 16:37 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard2003.codfw.wmnet with OS bookworm
  • 16:36 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetboard1003.eqiad.wmnet with OS bookworm
  • 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:54 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
  • 15:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
  • 15:50 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 15:41 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetboard2003.codfw.wmnet with OS bookworm
  • 15:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:40 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetboard1003.eqiad.wmnet with OS bookworm
  • 15:38 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 15:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 15:31 nskaggs@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:08 nskaggs@cumin1001: START - Cookbook sre.wikireplicas.update-views
  • 14:26 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: cluster=videoscaler,dc=eqiad,name=parse.*
  • 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=parse.*
  • 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
  • 14:25 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name="parse.*"
  • 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1003.eqiad.wmnet
  • 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
  • 14:06 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
  • 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard1003.eqiad.wmnet on all recursors
  • 14:06 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard1003.eqiad.wmnet on all recursors
  • 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
  • 14:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard1003.eqiad.wmnet - jbond@cumin1001"
  • 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2003.codfw.wmnet
  • 14:03 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
  • 14:03 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
  • 14:02 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 14:02 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetboard1003.eqiad.wmnet
  • 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard2003.codfw.wmnet on all recursors
  • 14:02 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard2003.codfw.wmnet on all recursors
  • 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:02 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
  • 14:01 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM puppetboard2003.codfw.wmnet - jbond@cumin2002"
  • 13:58 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 13:58 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetboard2003.codfw.wmnet
  • 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
  • 13:58 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:56 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 13:56 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
  • 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb1003.eqiad.wmnet
  • 13:56 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host puppetdb2003.codfw.wmnet
  • 13:55 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:52 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:51 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 13:46 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 13:46 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host puppetdb2003.codfw.wmnet
  • 13:45 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 13:45 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host puppetdb1003.eqiad.wmnet
  • 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:13 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
  • 13:12 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add the new pybal IPs at edge-only sites - bblack@cumin1001"
  • 13:06 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 12:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:43 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
  • 12:41 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add rest of eqiad+codfw pybal IPs - bblack@cumin1001"
  • 12:39 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 12:21 hashar@deploy1002: Finished deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis | T332474 (duration: 00m 08s)
  • 12:21 hashar@deploy1002: Started deploy [gerrit/gerrit@0932557]: wm-patch-demo: do not return runs when there are no wikis | T332474
  • 11:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 11:35 hashar@deploy1002: Finished deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing | T332474 (duration: 00m 08s)
  • 11:35 hashar@deploy1002: Started deploy [gerrit/gerrit@c490ae6]: wm-patch-demo: link to other patches, use WARNING to prevent chipset collapsing | T332474
  • 10:54 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 10:54 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 10:38 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 10:27 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:54 effie: pool parse1013-parse1016 to the jobrunner cluster - T329366
  • 09:29 jbond: disable puppet fleet wide to deploy minor puppet change https://gerrit.wikimedia.org/r/c/operations/puppet/+/923353
  • 09:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1016.eqiad.wmnet with OS buster
  • 09:26 effie: parse1013-parse1016 have neen depooled and removed from the parsoid-php service - T329366
  • 09:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1014.eqiad.wmnet with OS buster
  • 09:24 jnuche@deploy1002: Installation of scap version "4.52.3" completed for 596 hosts
  • 09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1013.eqiad.wmnet with OS buster
  • 09:23 jnuche@deploy1002: Installing scap version "4.52.3" for 596 hosts
  • 09:13 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 09:13 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 09:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parse1015.eqiad.wmnet with OS buster
  • 08:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
  • 08:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
  • 08:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
  • 08:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
  • 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1016.eqiad.wmnet with reason: host reimage
  • 08:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1015.eqiad.wmnet with reason: host reimage
  • 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1014.eqiad.wmnet with reason: host reimage
  • 08:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1013.eqiad.wmnet with reason: host reimage
  • 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1016.eqiad.wmnet with OS buster
  • 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1015.eqiad.wmnet with OS buster
  • 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1014.eqiad.wmnet with OS buster
  • 08:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host parse1013.eqiad.wmnet with OS buster
  • 08:10 jiji@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=parse101[3-6].eqiad.wmnet
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48591 and previous config saved to /var/cache/conftool/dbconfig/20230526-075903-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48590 and previous config saved to /var/cache/conftool/dbconfig/20230526-075809-root.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48589 and previous config saved to /var/cache/conftool/dbconfig/20230526-074358-root.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48588 and previous config saved to /var/cache/conftool/dbconfig/20230526-074304-root.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48587 and previous config saved to /var/cache/conftool/dbconfig/20230526-072854-root.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48586 and previous config saved to /var/cache/conftool/dbconfig/20230526-072759-root.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48585 and previous config saved to /var/cache/conftool/dbconfig/20230526-071349-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48584 and previous config saved to /var/cache/conftool/dbconfig/20230526-071255-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48583 and previous config saved to /var/cache/conftool/dbconfig/20230526-065844-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48582 and previous config saved to /var/cache/conftool/dbconfig/20230526-065750-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48581 and previous config saved to /var/cache/conftool/dbconfig/20230526-064340-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48580 and previous config saved to /var/cache/conftool/dbconfig/20230526-064245-root.json
  • 06:42 elukey: `apt-get clean` on stat1008 to clean up some space in the root partition
  • 06:36 elukey: `truncate /var/log/kerberos/krb5kdc.log -s 10g` on krb1001 to avoid the root partition to fill up
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48579 and previous config saved to /var/cache/conftool/dbconfig/20230526-062835-root.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48578 and previous config saved to /var/cache/conftool/dbconfig/20230526-062741-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48577 and previous config saved to /var/cache/conftool/dbconfig/20230526-061330-root.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48576 and previous config saved to /var/cache/conftool/dbconfig/20230526-061236-root.json
  • 03:51 fab@deploy1002: Finished deploy [airflow-dags/research@77cf676]: (no justification provided) (duration: 00m 17s)
  • 03:51 fab@deploy1002: Started deploy [airflow-dags/research@77cf676]: (no justification provided)

2023-05-25

  • 22:14 zabe@deploy1002: Finished scap: Backport for Replace deprecated Hooks::runWithoutAbort (T335536), BannerRenderer: Make sure the language variant is valid (T337427) (duration: 09m 14s)
  • 22:07 zabe@deploy1002: zabe and ladsgroup: Backport for Replace deprecated Hooks::runWithoutAbort (T335536), BannerRenderer: Make sure the language variant is valid (T337427) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:05 zabe@deploy1002: Started scap: Backport for Replace deprecated Hooks::runWithoutAbort (T335536), BannerRenderer: Make sure the language variant is valid (T337427)
  • 21:26 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@77cf676]: (no justification provided) (duration: 00m 08s)
  • 21:25 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@77cf676]: (no justification provided)
  • 20:47 TheresNoTime: close UTC late backport
  • 20:47 samtar@deploy1002: Finished scap: Backport for Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515) (duration: 08m 34s)
  • 20:40 samtar@deploy1002: samtar and matmarex: Backport for Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:38 samtar@deploy1002: Started scap: Backport for Manual backport of OOUI change I63293edd62 (tab dialog fix) (T337515)
  • 20:32 samtar@deploy1002: Finished scap: Backport for Use document feature classes to extract A/B test state (T335972) (duration: 10m 58s)
  • 20:22 samtar@deploy1002: jdrewniak and samtar: Backport for Use document feature classes to extract A/B test state (T335972) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:21 samtar@deploy1002: Started scap: Backport for Use document feature classes to extract A/B test state (T335972)
  • 20:13 samtar@deploy1002: Finished scap: Backport for [prod] Configure logging for the CampaignEvents channel (T337365) (duration: 08m 31s)
  • 20:06 samtar@deploy1002: samtar and daimona: Backport for [prod] Configure logging for the CampaignEvents channel (T337365) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:05 samtar@deploy1002: Started scap: Backport for [prod] Configure logging for the CampaignEvents channel (T337365)
  • 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:32 bblack@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
  • 19:31 bblack@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add pybal-low-traffic.svc.codfw.wmnet - bblack@cumin1001"
  • 19:29 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48575 and previous config saved to /var/cache/conftool/dbconfig/20230525-190946-root.json
  • 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48574 and previous config saved to /var/cache/conftool/dbconfig/20230525-190859-root.json
  • 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48573 and previous config saved to /var/cache/conftool/dbconfig/20230525-185441-root.json
  • 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48572 and previous config saved to /var/cache/conftool/dbconfig/20230525-185354-root.json
  • 18:43 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@6b27584]: (no justification provided) (duration: 00m 19s)
  • 18:43 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@6b27584]: (no justification provided)
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48571 and previous config saved to /var/cache/conftool/dbconfig/20230525-183937-root.json
  • 18:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48570 and previous config saved to /var/cache/conftool/dbconfig/20230525-183849-root.json
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48568 and previous config saved to /var/cache/conftool/dbconfig/20230525-182432-root.json
  • 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48567 and previous config saved to /var/cache/conftool/dbconfig/20230525-182345-root.json
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48566 and previous config saved to /var/cache/conftool/dbconfig/20230525-180927-root.json
  • 18:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48565 and previous config saved to /var/cache/conftool/dbconfig/20230525-180840-root.json
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48564 and previous config saved to /var/cache/conftool/dbconfig/20230525-175423-root.json
  • 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48563 and previous config saved to /var/cache/conftool/dbconfig/20230525-175335-root.json
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48562 and previous config saved to /var/cache/conftool/dbconfig/20230525-173918-root.json
  • 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48561 and previous config saved to /var/cache/conftool/dbconfig/20230525-173831-root.json
  • 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
  • 17:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS entires for migration IPs eqiad row E F switches. - cmooney@cumin1001"
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48559 and previous config saved to /var/cache/conftool/dbconfig/20230525-172413-root.json
  • 17:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48558 and previous config saved to /var/cache/conftool/dbconfig/20230525-172326-root.json
  • 17:15 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 17:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 17:07 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 17:06 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 17:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 17:03 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 16:39 topranks: adding outbound shaper config on eqsin to codfw transport cct (T328313)
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48557 and previous config saved to /var/cache/conftool/dbconfig/20230525-163657-ladsgroup.json
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48556 and previous config saved to /var/cache/conftool/dbconfig/20230525-162151-ladsgroup.json
  • 16:18 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:18 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:14 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:14 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
  • 16:11 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1,3]-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e3-eqiad uplinks to spine
  • 16:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
  • 16:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gerrit2002.wikimedia.org with reason: maintenance
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P48555 and previous config saved to /var/cache/conftool/dbconfig/20230525-160645-ladsgroup.json
  • 16:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bullseye
  • 15:57 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
  • 15:56 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e2-eqiad.mgmt,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e2-eqiad uplink from lsw1-f1 to ssw1-f1
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48553 and previous config saved to /var/cache/conftool/dbconfig/20230525-155139-ladsgroup.json
  • 15:49 dancy@deploy1002: Finished deploy [integration/docroot@dac2b70]: Updated Scap URLs (duration: 00m 07s)
  • 15:49 dancy@deploy1002: Started deploy [integration/docroot@dac2b70]: Updated Scap URLs
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T336886)', diff saved to and previous config saved to /var/cache/conftool/dbconfig/20230525-154927-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T336886)', diff saved to and previous config saved to /var/cache/conftool/dbconfig/20230525-154906-ladsgroup.json
  • 15:44 dancy: dancy@deploy1002 Updated scap URLs on doc.wikimedia.org
  • 15:43 dancy@deploy1002: Finished deploy [integration/docroot@78e6f40]: (no justification provided) (duration: 00m 10s)
  • 15:43 dancy@deploy1002: Started deploy [integration/docroot@78e6f40]: (no justification provided)
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48552 and previous config saved to /var/cache/conftool/dbconfig/20230525-153359-ladsgroup.json
  • 15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
  • 15:33 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e[1-2]-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
  • 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
  • 15:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
  • 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 15:27 kartik@deploy1002: Finished scap: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838) (duration: 07m 01s)
  • 15:22 kartik@deploy1002: kartik: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:21 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
  • 15:20 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-eqiad,lsw1-f1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr2-eqiad link to ssw1-e1-eqiad
  • 15:20 kartik@deploy1002: Started scap: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P48551 and previous config saved to /var/cache/conftool/dbconfig/20230525-151853-ladsgroup.json
  • 15:18 kartik@deploy1002: Finished scap: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838), Show Contribute menu item in main menu when Special:Contribute is enabled (T336838) (duration: 68m 07s)
  • 15:14 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye
  • 15:10 topranks: Migrating cr1-eqiad downlink to row E/F from lsw1-e1-eqiad et-0/0/48 to ssw1-e1-eqiad et-0/0/31
  • 15:10 mutante: gerrit-replica.wikimedia.org - gerrit2002 - reimaging - scheduled maintenance
  • 15:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
  • 15:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: maintenance
  • 15:04 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
  • 15:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr1-eqiad,lsw1-e1-eqiad.mgmt with reason: Migrate lsw1-e1-eqiad to cr1-eqiad link to ssw1-e1-eqiad
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48550 and previous config saved to /var/cache/conftool/dbconfig/20230525-150347-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48549 and previous config saved to /var/cache/conftool/dbconfig/20230525-145857-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 14:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48548 and previous config saved to /var/cache/conftool/dbconfig/20230525-145836-ladsgroup.json
  • 14:54 marostegui: Wikireplicas are lagging behind for the following sections: s1, s2, s5, s7 T337446
  • 14:54 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48547 and previous config saved to /var/cache/conftool/dbconfig/20230525-144330-ladsgroup.json
  • 14:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 14:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1026']
  • 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1027']
  • 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027']
  • 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
  • 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1025']
  • 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1024']
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P48546 and previous config saved to /var/cache/conftool/dbconfig/20230525-142824-ladsgroup.json
  • 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1025']
  • 14:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1024']
  • 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
  • 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbproxy1022']
  • 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
  • 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
  • 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1023']
  • 14:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
  • 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
  • 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1023']
  • 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
  • 14:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
  • 14:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbproxy1022']
  • 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1022']
  • 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1026']
  • 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
  • 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
  • 14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
  • 14:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver
  • 14:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=eqiad
  • 14:21 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad
  • 14:20 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=parsoid-php,dc=eqiad
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48545 and previous config saved to /var/cache/conftool/dbconfig/20230525-141318-ladsgroup.json
  • 14:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 kartik@deploy1002: kartik: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838), Show Contribute menu item in main menu when Special:Contribute is enabled (T336838) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:10 kartik@deploy1002: Started scap: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838), Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)
  • 14:09 volans@cumin1001: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P{puppetboard2002.codfw.wmnet} and (A:puppetboard)
  • 14:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
  • 14:08 volans@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
  • 14:08 volans@cumin1001: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P{puppetboard2002.codfw.wmnet} and (A:puppetboard)
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48544 and previous config saved to /var/cache/conftool/dbconfig/20230525-140822-ladsgroup.json
  • 14:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:08 kartik@deploy1002: Finished scap: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838), Show Contribute menu item in main menu when Special:Contribute is enabled (T336838) (duration: 15m 56s)
  • 13:53 kartik@deploy1002: kartik: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838), Show Contribute menu item in main menu when Special:Contribute is enabled (T336838) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:52 kartik@deploy1002: Started scap: Backport for Show Contribute menu item in main menu when Special:Contribute is enabled (T336838), Show Contribute menu item in main menu when Special:Contribute is enabled (T336838)
  • 13:46 urbanecm@deploy1002: Finished scap: Backport for Change maint script to do work via jobs (duration: 07m 42s)
  • 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:38 urbanecm@deploy1002: Started scap: Backport for Change maint script to do work via jobs
  • 13:28 urbanecm@deploy1002: Finished scap: Backport for Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436), Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436) (duration: 09m 06s)
  • 13:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:20 urbanecm@deploy1002: urbanecm and matmarex: Backport for Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436), Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:19 urbanecm@deploy1002: Started scap: Backport for Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436), Handle 'prefix' when 'action=edit', even if another extension overrides action (T337436)
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool sanitarium masters for s1, s5, s2, s7', diff saved to https://phabricator.wikimedia.org/P48538 and previous config saved to /var/cache/conftool/dbconfig/20230525-121012-root.json
  • 11:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:43 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:43 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48537 and previous config saved to /var/cache/conftool/dbconfig/20230525-113914-root.json
  • 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:31 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:31 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:30 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:30 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:26 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:26 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:25 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:25 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48536 and previous config saved to /var/cache/conftool/dbconfig/20230525-112409-root.json
  • 11:22 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:22 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:21 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:20 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:15 jbond: update udplog on mwlog server
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48535 and previous config saved to /var/cache/conftool/dbconfig/20230525-110948-root.json
  • 11:09 jbond: upload udplog_1.10_amd64.deb
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48534 and previous config saved to /var/cache/conftool/dbconfig/20230525-110905-root.json
  • 11:05 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:03 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:03 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:54 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48533 and previous config saved to /var/cache/conftool/dbconfig/20230525-105443-root.json
  • 10:54 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 10:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
  • 10:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48532 and previous config saved to /var/cache/conftool/dbconfig/20230525-105400-root.json
  • 10:53 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 10:52 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 10:49 klausman@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:49 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:48 klausman@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2005-dev.wikimedia.org
  • 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:41 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48531 and previous config saved to /var/cache/conftool/dbconfig/20230525-103939-root.json
  • 10:39 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2005-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48530 and previous config saved to /var/cache/conftool/dbconfig/20230525-103855-root.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48529 and previous config saved to /var/cache/conftool/dbconfig/20230525-103445-root.json
  • 10:32 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 10:24 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2005-dev.wikimedia.org
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48528 and previous config saved to /var/cache/conftool/dbconfig/20230525-102434-root.json
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48527 and previous config saved to /var/cache/conftool/dbconfig/20230525-102351-root.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48526 and previous config saved to /var/cache/conftool/dbconfig/20230525-101940-root.json
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48525 and previous config saved to /var/cache/conftool/dbconfig/20230525-100927-root.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48524 and previous config saved to /var/cache/conftool/dbconfig/20230525-100846-root.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48523 and previous config saved to /var/cache/conftool/dbconfig/20230525-100436-root.json
  • 10:00 kart_: Updated cxserver to 2023-05-25-093623-production (config: language pairs transform fix + T331201)
  • 09:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 09:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48522 and previous config saved to /var/cache/conftool/dbconfig/20230525-095423-root.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48521 and previous config saved to /var/cache/conftool/dbconfig/20230525-095341-root.json
  • 09:51 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 09:51 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48520 and previous config saved to /var/cache/conftool/dbconfig/20230525-094931-root.json
  • 09:48 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 09:48 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48519 and previous config saved to /var/cache/conftool/dbconfig/20230525-093918-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48518 and previous config saved to /var/cache/conftool/dbconfig/20230525-093426-root.json
  • 09:32 apergos: running from dumpsdata1004 via ariel login screen session, as root, rsync with bwlimit 100000 to dumpsdata1006, copying all public xml dumps data
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48517 and previous config saved to /var/cache/conftool/dbconfig/20230525-092413-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48516 and previous config saved to /var/cache/conftool/dbconfig/20230525-091922-root.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2179', diff saved to https://phabricator.wikimedia.org/P48515 and previous config saved to /var/cache/conftool/dbconfig/20230525-091132-root.json
  • 09:10 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48514 and previous config saved to /var/cache/conftool/dbconfig/20230525-090417-root.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48513 and previous config saved to /var/cache/conftool/dbconfig/20230525-084912-root.json
  • 08:32 elukey: revoke kafka_mirror_maker TLS cert (cergen based), remove old cergen certs from puppet private - T337248
  • 07:52 matthiasmullie: UTC morning backports done
  • 07:51 mlitn@deploy1002: Finished scap: Backport for Change maint script to do work via jobs (T322872) (duration: 16m 12s)
  • 07:37 mlitn@deploy1002: mlitn: Backport for Change maint script to do work via jobs (T322872) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 07:35 mlitn@deploy1002: Started scap: Backport for Change maint script to do work via jobs (T322872)
  • 07:18 mlitn@deploy1002: Finished scap: Backport for [WikibaseMediaInfo] Add 'main subject of' property (duration: 14m 02s)
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P48511 and previous config saved to /var/cache/conftool/dbconfig/20230525-071719-root.json
  • 07:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:06 mlitn@deploy1002: mlitn: Backport for [WikibaseMediaInfo] Add 'main subject of' property synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:04 mlitn@deploy1002: Started scap: Backport for [WikibaseMediaInfo] Add 'main subject of' property
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1196', diff saved to https://phabricator.wikimedia.org/P48509 and previous config saved to /var/cache/conftool/dbconfig/20230525-064418-root.json
  • 06:09 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P48506 and previous config saved to /var/cache/conftool/dbconfig/20230525-055734-root.json
  • 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: T337446
  • 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 9 hosts with reason: T337446
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P48504 and previous config saved to /var/cache/conftool/dbconfig/20230525-055236-root.json
  • 05:48 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110', diff saved to https://phabricator.wikimedia.org/P48503 and previous config saved to /var/cache/conftool/dbconfig/20230525-051923-root.json
  • 02:14 eileen: civicrm upgraded from b8cab6f6 to 415aa7e5
  • 02:14 eileen: civicrm upgraded from b8cab6f6 to 415aa7e5

2023-05-24

  • 21:18 urbanecm@deploy1002: Finished scap: Backport for [Growth] Deploy Personalized praise to pilot wikis with notifications (T334630) (duration: 09m 40s)
  • 21:10 urbanecm@deploy1002: urbanecm: Backport for [Growth] Deploy Personalized praise to pilot wikis with notifications (T334630) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:08 urbanecm@deploy1002: Started scap: Backport for [Growth] Deploy Personalized praise to pilot wikis with notifications (T334630)
  • 20:55 samtar@deploy1002: Finished scap: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373) (duration: 08m 15s)
  • 20:48 samtar@deploy1002: samtar: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:47 samtar@deploy1002: Started scap: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373)
  • 20:25 samtar@deploy1002: Finished scap: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373) (duration: 08m 31s)
  • 20:18 samtar@deploy1002: samtar: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:16 samtar@deploy1002: Started scap: Backport for ipInfo.hooks: Use wgRelevantUserName (T337373)
  • 20:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:08 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:12 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9 refs T330216 (duration: 06m 00s)
  • 19:06 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9 refs T330216
  • 18:55 demon@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.10 refs T330216 (duration: 06m 00s)
  • 18:49 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.10 refs T330216
  • 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1149.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:22 ejegg: civicrm upgraded from 4251dfa1 to b8cab6f6
  • 16:54 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@1603ecf]: Deploying T336800 on platform_eng Airflow instance (duration: 00m 09s)
  • 16:54 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@1603ecf]: Deploying T336800 on platform_eng Airflow instance
  • 16:05 elukey: move kafka mirror on kafka main brokers to PKI - T337248
  • 16:01 urbanecm@deploy1002: Finished scap: Backport for Personalized praise: Add instrumentation (T325117), Personalized praise: Add instrumentation (T325117) (duration: 08m 33s)
  • 15:56 elukey: move kafka mirror on kafka jumbo brokers to PKI - T337248
  • 15:54 urbanecm@deploy1002: urbanecm: Backport for Personalized praise: Add instrumentation (T325117), Personalized praise: Add instrumentation (T325117) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 15:52 urbanecm@deploy1002: Started scap: Backport for Personalized praise: Add instrumentation (T325117), Personalized praise: Add instrumentation (T325117)
  • 15:47 ejegg: payments-wiki upgraded from e02bc7c5 to c2f9f8b5
  • 15:39 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363] (duration: 01m 35s)
  • 15:38 ejegg: standalone SmashPig upgraded from 5460dbe2 to db23b998
  • 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@24ff363]
  • 15:37 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363] (duration: 00m 04s)
  • 15:37 aqu@deploy1002: Started deploy [analytics/refinery@24ff363] (thin): Regular analytics weekly train THIN [analytics/refinery@24ff363]
  • 15:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:32 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:31 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:31 aqu@deploy1002: Finished deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363] (duration: 06m 13s)
  • 15:31 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:30 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:26 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:26 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:25 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:25 aqu@deploy1002: Started deploy [analytics/refinery@24ff363]: Regular analytics weekly train [analytics/refinery@24ff363]
  • 15:24 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:22 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:22 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:18 aqu: analytics-refinery, about to deploy
  • 15:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:30 volans@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling restart_daemons on P{puppetboard2002.codfw.wmnet} and (A:puppetboard)
  • 14:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
  • 14:30 volans@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
  • 14:29 volans@cumin2002: START - Cookbook sre.puppetboard.restart-reboot rolling restart_daemons on P{puppetboard2002.codfw.wmnet} and (A:puppetboard)
  • 14:26 volans@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 14:26 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 14:19 urbanecm@deploy1002: Finished scap: Backport for Enable DiscussionTools newtopictool on fiwiki (T317375) (duration: 12m 11s)
  • 14:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation | T332474 (duration: 00m 07s)
  • 14:13 hashar@deploy1002: Started deploy [gerrit/gerrit@2d719f3]: wm-patch-demo: initial implementation | T332474
  • 14:08 urbanecm@deploy1002: urbanecm and matmarex: Backport for Enable DiscussionTools newtopictool on fiwiki (T317375) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:06 urbanecm@deploy1002: Started scap: Backport for Enable DiscussionTools newtopictool on fiwiki (T317375)
  • 14:06 urbanecm@deploy1002: Finished scap: Backport for MultiPaneDialog: remove attribute hidden instead of class (T337256), Add maint script to opt out active users from the new topic tool (T317375), Define $maintClass in maintenance script for compatibility (T317375), NewTopicOptOutActiveUsers: Skip bot users etc. (T317375) (duration: 09m 21s)
  • 13:58 urbanecm@deploy1002: matmarex and urbanecm and sgimeno: Backport for MultiPaneDialog: remove attribute hidden instead of class (T337256), Add maint script to opt out active users from the new topic tool (T317375), Define $maintClass in maintenance script for compatibility (T317375), NewTopicOptOutActiveUsers: Skip bot users etc. (T317375) synced t
  • 13:56 urbanecm@deploy1002: Started scap: Backport for MultiPaneDialog: remove attribute hidden instead of class (T337256), Add maint script to opt out active users from the new topic tool (T317375), Define $maintClass in maintenance script for compatibility (T317375), NewTopicOptOutActiveUsers: Skip bot users etc. (T317375)
  • 13:55 urbanecm@deploy1002: Finished scap: Backport for [Growth] Add mediawiki.mentor_dashboard.interaction (T325117) (duration: 07m 06s)
  • 13:48 urbanecm@deploy1002: Started scap: Backport for [Growth] Add mediawiki.mentor_dashboard.interaction (T325117)
  • 13:36 samtar@deploy1002: Finished scap: Backport for Enable Kartographer Nearby on remaining wikis (T336834) (duration: 08m 04s)
  • 13:29 samtar@deploy1002: samtar and wmde-fisch: Backport for Enable Kartographer Nearby on remaining wikis (T336834) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:28 samtar@deploy1002: Started scap: Backport for Enable Kartographer Nearby on remaining wikis (T336834)
  • 13:26 samtar@deploy1002: Finished scap: Backport for [cirrus] Fix typo in config var (duration: 10m 15s)
  • 13:17 samtar@deploy1002: samtar and dcausse: Backport for [cirrus] Fix typo in config var synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:16 samtar@deploy1002: Started scap: Backport for [cirrus] Fix typo in config var
  • 13:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:14 samtar@deploy1002: Finished scap: Backport for arclamp: switch redis server to arclamp1001 (T327277) (duration: 07m 53s)
  • 13:07 samtar@deploy1002: herron and samtar: Backport for arclamp: switch redis server to arclamp1001 (T327277) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:07 xSavitar: tools.codesearch Deployed https://gerrit.wikimedia.org/r/c/labs/codesearch/+/909258 and also restarted tool instances to core search backend was dead.
  • 13:06 samtar@deploy1002: Started scap: Backport for arclamp: switch redis server to arclamp1001 (T327277)
  • 12:55 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript findBadBlobs --wiki nowiki --revisions 5227369 --mark T337392` T337392
  • 12:47 tgr_: running changeWikiConfig.php on Growth pilot wikis for T337348
  • 10:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-codfw cluster: Reboot kafka nodes
  • 09:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2448.codfw.wmnet
  • 09:42 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2448.codfw.wmnet
  • 09:04 dcausse@deploy1002: Finished deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset (duration: 00m 17s)
  • 09:04 dcausse@deploy1002: Started deploy [airflow-dags/search@c08e884]: search: build and use a smaller cirrus index dataset
  • 08:52 claime: repooling mw2248.codfw.wmnet - T334429
  • 08:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-codfw cluster: Reboot kafka nodes
  • 08:50 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 08:49 marostegui: Stop mariadb on db1154 (sanitarium) there will be lag on clouddb* hosts
  • 08:36 urbanecm@deploy1002: Finished scap: Backport for Migrate GrowthExperiments config to its own file (T308932) (duration: 07m 20s)
  • 08:28 urbanecm@deploy1002: Started scap: Backport for Migrate GrowthExperiments config to its own file (T308932)
  • 07:42 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 07:42 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 07:41 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 07:40 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:11 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:02 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 05:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136106
  • 05:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136106
  • 01:19 mutante: contint2001 - jenkins started again
  • 01:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
  • 01:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:45 mutante: short maintenance on main contint server (jenkins)
  • 00:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2001.wikimedia.org with reason: maintenance
  • 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
  • 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint2002.wikimedia.org with reason: maintenance
  • 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint1002.wikimedia.org with reason: maintenance
  • 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on contint1002.wikimedia.org with reason: maintenance

2023-05-23

  • 23:52 mutante: releases1002 - jenkins service running again, this is the active host behind releases-jenkins.wikimedia.org - maintenance for releases* done
  • 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
  • 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on releases1002.eqiad.wmnet with reason: maintenance
  • 23:41 mutante: releases1002 (releases.wikimedia.org) stopping jenkins for maintenance
  • 23:30 mutante: contint*, releases* - maintenance - changing UID of jenkins user - jenkins will be stopped for a little bit, releases-jenkins is first though - T324659
  • 22:00 eileen: civicrm upgraded from 11538e23 to 4251dfa1
  • 21:26 ejegg: payments-wiki upgraded from a7567c6a to e02bc7c5
  • 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:06 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:02 TheresNoTime: close UTC late backport window
  • 21:01 samtar@deploy1002: Finished scap: Backport for Turn on the A/B test for testwiki (T336969) (duration: 11m 47s)
  • 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:01 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 21:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:51 samtar@deploy1002: ksarabia and samtar: Backport for Turn on the A/B test for testwiki (T336969) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:50 samtar@deploy1002: Started scap: Backport for Turn on the A/B test for testwiki (T336969)
  • 20:48 samtar@deploy1002: Finished scap: Backport for Remove centraluserid dependency in ABRequirement.php (T336969), Remove centraluserid dependency in ABRequirement.php (T336969) (duration: 11m 20s)
  • 20:38 samtar@deploy1002: samtar: Backport for Remove centraluserid dependency in ABRequirement.php (T336969), Remove centraluserid dependency in ABRequirement.php (T336969) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:37 ejegg: civicrm upgraded from efe25c9b to 11538e23
  • 20:37 samtar@deploy1002: Started scap: Backport for Remove centraluserid dependency in ABRequirement.php (T336969), Remove centraluserid dependency in ABRequirement.php (T336969)
  • 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:10 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:58 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:53 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:50 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:42 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:42 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:41 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy102{2..7} - jclark@cumin1001"
  • 19:39 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy102{2..7} - jclark@cumin1001"
  • 19:36 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1027
  • 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1027
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1026
  • 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1026
  • 19:34 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1025
  • 19:33 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
  • 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:31 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:31 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1025
  • 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
  • 19:30 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
  • 19:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
  • 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
  • 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
  • 19:27 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dbproxy1024
  • 19:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
  • 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
  • 19:25 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
  • 19:25 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1022
  • 19:25 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.10 refs T330216
  • 19:24 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1022
  • 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:24 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:09 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:29 inflatador: bking@cumin1001 rolling restart of codfw wdqs public hosts T337327
  • 18:26 ryankemper: [WDQS] T337327 Deployed new, hopefully-working rule after addressing previous syntax error (unescaped `"`). See `/srv/private` commit `6e2f5ab19427902994bb9d03d28277252f021474`
  • 18:16 ryankemper: [WDQS] Rolled back requestctl rule
  • 18:12 ryankemper: [WDQS] T337327 New rule in place to ban potential source of WDQS codfw outage. Rolling restart will be done in a couple minutes to [attempt to] restore service availability
  • 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:05 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:03 sbassett: Deployed updated security mitigation for T336027 and T333140
  • 17:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes
  • 16:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:50 sbassett: Deployed updated security mitigation for T336027, part 2
  • 16:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:49 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:43 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
  • 16:43 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:43 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:42 sbassett: Deployed updated security mitigation for T336027
  • 16:41 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Homer Release v0.6.2 with updated wmf-plugin - cmooney@cumin1001
  • 16:31 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Rename page content change enrich error stream to match convention - T336656 (duration: 06m 58s)
  • 16:22 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937 (duration: 36m 02s)
  • 15:56 topranks: moving lvs1018 connection to rack E1 from lsw1-e1-eqiad to ssw1-e1-eqiad T322937
  • 15:46 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937
  • 15:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:45 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:45 sukhe: stop pybal on lvs1018: T322937
  • 15:38 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases2003.codfw.wmnet with OS bullseye
  • 15:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:24 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
  • 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:22 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:22 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:21 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 15:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:21 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:20 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases2003.codfw.wmnet with reason: host reimage
  • 15:20 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:19 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:16 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:14 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:03 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases2003.codfw.wmnet with OS bullseye
  • 15:02 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host releases1003.eqiad.wmnet with OS bullseye
  • 15:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:00 akosiaris@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka main-eqiad cluster: Reboot kafka nodes
  • 14:58 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:58 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:57 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:51 moritzm: removed imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 from apt.wikimedia.org/buster-wikimedia now that the Thumbor spec tests have been upgraded to match latest patches
  • 14:49 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
  • 14:46 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on releases1003.eqiad.wmnet with reason: host reimage
  • 14:36 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host releases1003.eqiad.wmnet with OS bullseye
  • 14:33 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kafkamon2002.codfw.wmnet
  • 14:05 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:05 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
  • 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases2003.codfw.wmnet
  • 14:04 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
  • 14:04 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
  • 14:03 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases2003.codfw.wmnet - eoghan@cumin1001"
  • 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases2003.codfw.wmnet on all recursors
  • 14:02 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases2003.codfw.wmnet on all recursors
  • 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:02 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
  • 14:01 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases2003.codfw.wmnet - eoghan@cumin1001"
  • 14:01 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 14:00 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:57 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 13:57 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases2003.codfw.wmnet
  • 13:56 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2002.codfw.wmnet
  • 13:56 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1002.eqiad.wmnet
  • 13:55 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:55 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
  • 13:54 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafkamon1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
  • 13:50 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host releases1003.eqiad.wmnet
  • 13:50 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
  • 13:47 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM releases1003.eqiad.wmnet - eoghan@cumin1001"
  • 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) releases1003.eqiad.wmnet on all recursors
  • 13:46 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache releases1003.eqiad.wmnet on all recursors
  • 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:46 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
  • 13:46 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1002.eqiad.wmnet
  • 13:45 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM releases1003.eqiad.wmnet - eoghan@cumin1001"
  • 13:45 hoo@deploy1002: Finished scap: Backport for Restore targets declarations temporarily (T336956), Restore targets declarations temporarily (T336956) (duration: 12m 49s)
  • 13:44 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 13:44 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 13:43 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host releases1003.eqiad.wmnet
  • 13:33 hoo@deploy1002: hoo: Backport for Restore targets declarations temporarily (T336956), Restore targets declarations temporarily (T336956) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:32 hoo@deploy1002: Started scap: Backport for Restore targets declarations temporarily (T336956), Restore targets declarations temporarily (T336956)
  • 13:11 akosiaris@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:21 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:56 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 11:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 11:55 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 11:55 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:54 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:40 akosiaris@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 10:29 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
  • 10:21 akosiaris: reboot rdb1011 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
  • 10:21 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
  • 10:13 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
  • 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1001.eqiad.wmnet
  • 10:07 akosiaris: reboot rdb2009 for kernel upgrades. ORES in codfw will have a 5m downtime. Other things that might be impacted (but won't): changeprop/cpjobqueue/api-gateway/docker-registry/filebackend.php
  • 10:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
  • 10:02 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
  • 09:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48493 and previous config saved to /var/cache/conftool/dbconfig/20230523-095720-root.json
  • 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:56 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:55 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 09:55 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 09:51 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
  • 09:50 stevemunene: reboot an-test-master1002.eqiad.wmnet December 2022 Buster reboots T325132
  • 09:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1003.eqiad.wmnet
  • 09:42 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1003.eqiad.wmnet
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48492 and previous config saved to /var/cache/conftool/dbconfig/20230523-094216-root.json
  • 09:42 stevemunene: reboot an-test-worker1003.eqiad.wmnet December 2022 Buster reboots T325132
  • 09:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
  • 09:34 stevemunene@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48491 and previous config saved to /var/cache/conftool/dbconfig/20230523-092711-root.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48490 and previous config saved to /var/cache/conftool/dbconfig/20230523-091207-root.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48489 and previous config saved to /var/cache/conftool/dbconfig/20230523-085702-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48488 and previous config saved to /var/cache/conftool/dbconfig/20230523-085246-root.json
  • 08:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately | T214068 (duration: 00m 07s)
  • 08:44 hashar@deploy1002: Started deploy [gerrit/gerrit@69bc27c]: wm-zuul-status: show reload immediately | T214068
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48487 and previous config saved to /var/cache/conftool/dbconfig/20230523-084157-root.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48486 and previous config saved to /var/cache/conftool/dbconfig/20230523-083741-root.json
  • 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1122.eqiad.wmnet
  • 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 08:35 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1122.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 08:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 08:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1122.eqiad.wmnet
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48485 and previous config saved to /var/cache/conftool/dbconfig/20230523-082653-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48484 and previous config saved to /var/cache/conftool/dbconfig/20230523-082237-root.json
  • 08:14 kartik@deploy1002: Finished scap: Backport for Special:Contribute: Correct language code for Albanian (T327868) (duration: 08m 37s)
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1119 from dbctl T337206', diff saved to https://phabricator.wikimedia.org/P48483 and previous config saved to /var/cache/conftool/dbconfig/20230523-081342-marostegui.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1024 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48482 and previous config saved to /var/cache/conftool/dbconfig/20230523-081148-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48481 and previous config saved to /var/cache/conftool/dbconfig/20230523-080732-root.json
  • 08:07 kartik@deploy1002: kartik: Backport for Special:Contribute: Correct language code for Albanian (T327868) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:05 kartik@deploy1002: Started scap: Backport for Special:Contribute: Correct language code for Albanian (T327868)
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48480 and previous config saved to /var/cache/conftool/dbconfig/20230523-075227-root.json
  • 07:51 hashar@deploy1002: Finished deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion | T214068 (duration: 00m 07s)
  • 07:51 hashar@deploy1002: Started deploy [gerrit/gerrit@d151775]: wm-zuul-status: offer to reload on CI completion | T214068
  • 07:47 marostegui@deploy1002: Finished scap: Backport for Revert "db-production.php: Disable writes in es5" (duration: 07m 19s)
  • 07:44 hashar@deploy1002: Finished deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion | T214068 (duration: 00m 07s)
  • 07:44 hashar@deploy1002: Started deploy [gerrit/gerrit@e815301]: wm-zuul-status: offer to reload on CI completion | T214068
  • 07:41 marostegui@deploy1002: marostegui: Backport for Revert "db-production.php: Disable writes in es5" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:39 marostegui@deploy1002: Started scap: Backport for Revert "db-production.php: Disable writes in es5"
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1024 T337285', diff saved to https://phabricator.wikimedia.org/P48479 and previous config saved to /var/cache/conftool/dbconfig/20230523-073841-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48478 and previous config saved to /var/cache/conftool/dbconfig/20230523-073722-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary T337285', diff saved to https://phabricator.wikimedia.org/P48477 and previous config saved to /var/cache/conftool/dbconfig/20230523-073710-root.json
  • 07:36 marostegui: Starting es5 eqiad failover from es1024 to es1023 T337285
  • 07:25 marostegui@deploy1002: Finished scap: Backport for db-production.php: Disable writes in es5 (T337285) (duration: 07m 16s)
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48476 and previous config saved to /var/cache/conftool/dbconfig/20230523-072218-root.json
  • 07:19 marostegui@deploy1002: marostegui: Backport for db-production.php: Disable writes in es5 (T337285) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T337285
  • 07:17 marostegui@deploy1002: Started scap: Backport for db-production.php: Disable writes in es5 (T337285)
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T337285
  • 07:14 kartik@deploy1002: Finished scap: Backport for Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868) (duration: 09m 42s)
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48475 and previous config saved to /var/cache/conftool/dbconfig/20230523-070713-root.json
  • 07:06 kartik@deploy1002: kartik: Backport for Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48474 and previous config saved to /var/cache/conftool/dbconfig/20230523-070547-root.json
  • 07:04 kartik@deploy1002: Started scap: Backport for Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)
  • 07:00 marostegui@deploy1002: Finished scap: Backport for Revert "db-production: Disable es4 writes" (duration: 06m 58s)
  • 06:54 marostegui@deploy1002: marostegui: Backport for Revert "db-production: Disable es4 writes" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 06:53 marostegui@deploy1002: Started scap: Backport for Revert "db-production: Disable es4 writes"
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48473 and previous config saved to /var/cache/conftool/dbconfig/20230523-065042-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Change es1020 weight', diff saved to https://phabricator.wikimedia.org/P48472 and previous config saved to /var/cache/conftool/dbconfig/20230523-064850-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1021 T337283', diff saved to https://phabricator.wikimedia.org/P48471 and previous config saved to /var/cache/conftool/dbconfig/20230523-064820-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1020 to es4 primary T337283', diff saved to https://phabricator.wikimedia.org/P48470 and previous config saved to /var/cache/conftool/dbconfig/20230523-064729-root.json
  • 06:46 marostegui: Starting es4 eqiad failover from es1021 to es1020 - T337283
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1020 with weight 0 T337283', diff saved to https://phabricator.wikimedia.org/P48469 and previous config saved to /var/cache/conftool/dbconfig/20230523-063836-root.json
  • 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 T337283
  • 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 T337283
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48468 and previous config saved to /var/cache/conftool/dbconfig/20230523-063538-root.json
  • 06:26 marostegui@deploy1002: Finished scap: Backport for db-production: Disable es4 writes (T337283) (duration: 08m 21s)
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48467 and previous config saved to /var/cache/conftool/dbconfig/20230523-062033-root.json
  • 06:19 marostegui@deploy1002: marostegui: Backport for db-production: Disable es4 writes (T337283) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 06:18 marostegui@deploy1002: Started scap: Backport for db-production: Disable es4 writes (T337283)
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48466 and previous config saved to /var/cache/conftool/dbconfig/20230523-060528-root.json
  • 06:04 kart_: cxserver: Remove Flores MT service (T331505)
  • 06:03 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:02 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:00 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:56 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48465 and previous config saved to /var/cache/conftool/dbconfig/20230523-055024-root.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48464 and previous config saved to /var/cache/conftool/dbconfig/20230523-053519-root.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48463 and previous config saved to /var/cache/conftool/dbconfig/20230523-052014-root.json
  • 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.8 (duration: 02m 17s)
  • 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.10 refs T330216 (duration: 49m 04s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.10 refs T330216
  • 02:57 eileen: civicrm upgraded from 3329155a to 6642b602
  • 02:22 eileen: civicrm upgraded from 7eae24d5 to 3329155a

2023-05-22

  • 23:29 eileen: civicrm upgraded from cc9593d0 to 7eae24d5
  • 23:16 zabe@deploy1002: Finished scap: Backport for Enable VE on new wikis (duration: 06m 58s)
  • 23:11 zabe@deploy1002: zabe: Backport for Enable VE on new wikis synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 23:09 zabe@deploy1002: Started scap: Backport for Enable VE on new wikis
  • 21:38 sbassett: Deployed security mitigations for T333140 and T336027
  • 20:55 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1004.eqiad.wmnet
  • 20:55 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:54 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:53 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:51 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:45 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1004.eqiad.wmnet
  • 20:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1005.eqiad.wmnet
  • 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:44 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:43 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: labstore1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 20:40 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:33 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1005.eqiad.wmnet
  • 20:27 TheresNoTime: close UTC late backport window
  • 20:24 samtar@deploy1002: Finished scap: Backport for [kaawiki] Enable SandboxLink extension (T336648) (duration: 07m 47s)
  • 20:17 samtar@deploy1002: samtar and superpes: Backport for [kaawiki] Enable SandboxLink extension (T336648) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:16 samtar@deploy1002: Started scap: Backport for [kaawiki] Enable SandboxLink extension (T336648)
  • 20:14 samtar@deploy1002: Finished scap: Backport for [ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625) (duration: 08m 22s)
  • 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
  • 20:09 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs[2010-2011].codfw.wmnet
  • 20:08 samtar@deploy1002: superpes and samtar: Backport for [ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:06 samtar@deploy1002: Started scap: Backport for [ruwiki] Add 'abusefilter log/view private' flags to ArbCom (T336625)
  • 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:22 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:20 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:04 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided) (duration: 00m 17s)
  • 17:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@5ee7a62]: (no justification provided)
  • 16:58 XioNoX: push mgmt_junos to all L2 switches
  • 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2009.codfw.wmnet
  • 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2009.codfw.wmnet
  • 15:57 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
  • 15:56 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2009.codfw.wmnet
  • 15:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 15:26 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 15:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
  • 15:25 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 15:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - T241049"
  • 15:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "New debmonitor VMs - jmm@cumin2002 - T241049"
  • 14:32 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:31 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:10 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor2003.codfw.wmnet with OS bookworm
  • 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
  • 12:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
  • 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor2003.codfw.wmnet with OS bookworm
  • 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor1003.eqiad.wmnet with OS bookworm
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P48456 and previous config saved to /var/cache/conftool/dbconfig/20230522-115936-root.json
  • 11:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor1003.eqiad.wmnet with reason: host reimage
  • 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host debmonitor1003.eqiad.wmnet with OS bookworm
  • 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path T337220
  • 10:17 topranks: Un-draining transport circuit from eqsin to codfw, moving traffic back to default path
  • 10:06 hashar@deploy1002: Finished scap: Backport for Revert "[WikibaseMediaInfo] Add 'main subject of' property" (duration: 37m 00s)
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor2003.codfw.wmnet
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
  • 10:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
  • 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor2003.codfw.wmnet on all recursors
  • 10:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor2003.codfw.wmnet on all recursors
  • 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
  • 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor2003.codfw.wmnet - jmm@cumin2002"
  • 10:02 moritzm: installing updated usb.ids packages for Bullseye
  • 10:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor2003.codfw.wmnet
  • 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host debmonitor1003.eqiad.wmnet
  • 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
  • 09:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) debmonitor1003.eqiad.wmnet on all recursors
  • 09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache debmonitor1003.eqiad.wmnet on all recursors
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
  • 09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM debmonitor1003.eqiad.wmnet - jmm@cumin2002"
  • 09:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host debmonitor1003.eqiad.wmnet
  • 09:39 hashar@deploy1002: hashar: Backport for Revert "[WikibaseMediaInfo] Add 'main subject of' property" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:29 hashar@deploy1002: Started scap: Backport for Revert "[WikibaseMediaInfo] Add 'main subject of' property"
  • 08:46 marostegui: Stop mysql on db2160 (haproxy irc alerts will be generated)
  • 08:28 elukey: drain Arelion link between cr1-codfw and cr3-eqsin to mitigate packet loss eqiad <-> eqsin
  • 08:22 moritzm: installing systemd security updates
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48455 and previous config saved to /var/cache/conftool/dbconfig/20230522-081724-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48454 and previous config saved to /var/cache/conftool/dbconfig/20230522-080219-root.json
  • 07:59 elukey: restart purged on cp5017 as test to clear out consumer group timeouts and rejoin events
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48453 and previous config saved to /var/cache/conftool/dbconfig/20230522-075613-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48452 and previous config saved to /var/cache/conftool/dbconfig/20230522-074715-root.json
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48451 and previous config saved to /var/cache/conftool/dbconfig/20230522-074109-root.json
  • 07:37 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 07:32 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 07:32 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48450 and previous config saved to /var/cache/conftool/dbconfig/20230522-073210-root.json
  • 07:28 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48449 and previous config saved to /var/cache/conftool/dbconfig/20230522-072604-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48448 and previous config saved to /var/cache/conftool/dbconfig/20230522-071705-root.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48447 and previous config saved to /var/cache/conftool/dbconfig/20230522-071333-root.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48446 and previous config saved to /var/cache/conftool/dbconfig/20230522-071326-root.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48445 and previous config saved to /var/cache/conftool/dbconfig/20230522-071319-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48444 and previous config saved to /var/cache/conftool/dbconfig/20230522-071059-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48443 and previous config saved to /var/cache/conftool/dbconfig/20230522-070200-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48442 and previous config saved to /var/cache/conftool/dbconfig/20230522-065828-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48441 and previous config saved to /var/cache/conftool/dbconfig/20230522-065822-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48440 and previous config saved to /var/cache/conftool/dbconfig/20230522-065815-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48439 and previous config saved to /var/cache/conftool/dbconfig/20230522-065555-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48438 and previous config saved to /var/cache/conftool/dbconfig/20230522-064656-root.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 T337206', diff saved to https://phabricator.wikimedia.org/P48437 and previous config saved to /var/cache/conftool/dbconfig/20230522-064541-root.json
  • 06:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
  • 06:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48436 and previous config saved to /var/cache/conftool/dbconfig/20230522-064323-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48435 and previous config saved to /var/cache/conftool/dbconfig/20230522-064317-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48434 and previous config saved to /var/cache/conftool/dbconfig/20230522-064310-root.json
  • 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1121.eqiad.wmnet
  • 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48433 and previous config saved to /var/cache/conftool/dbconfig/20230522-064050-root.json
  • 06:40 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1121.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:38 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
  • 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1121.eqiad.wmnet
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2023 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48432 and previous config saved to /var/cache/conftool/dbconfig/20230522-063151-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48431 and previous config saved to /var/cache/conftool/dbconfig/20230522-062818-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48430 and previous config saved to /var/cache/conftool/dbconfig/20230522-062812-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48429 and previous config saved to /var/cache/conftool/dbconfig/20230522-062805-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48428 and previous config saved to /var/cache/conftool/dbconfig/20230522-062545-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight to es2024', diff saved to https://phabricator.wikimedia.org/P48427 and previous config saved to /var/cache/conftool/dbconfig/20230522-061947-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2023 T337204', diff saved to https://phabricator.wikimedia.org/P48426 and previous config saved to /var/cache/conftool/dbconfig/20230522-061925-root.json
  • 06:17 marostegui: Starting es5 codfw failover from es2023 to es2024 - T337204
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T337204
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2024 with weight 0 T337204', diff saved to https://phabricator.wikimedia.org/P48425 and previous config saved to /var/cache/conftool/dbconfig/20230522-061524-root.json
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T337204
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48424 and previous config saved to /var/cache/conftool/dbconfig/20230522-061314-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48423 and previous config saved to /var/cache/conftool/dbconfig/20230522-061307-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48422 and previous config saved to /var/cache/conftool/dbconfig/20230522-061300-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48421 and previous config saved to /var/cache/conftool/dbconfig/20230522-061040-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2021', diff saved to https://phabricator.wikimedia.org/P48420 and previous config saved to /var/cache/conftool/dbconfig/20230522-061033-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48419 and previous config saved to /var/cache/conftool/dbconfig/20230522-055809-root.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48418 and previous config saved to /var/cache/conftool/dbconfig/20230522-055803-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48417 and previous config saved to /var/cache/conftool/dbconfig/20230522-055756-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48416 and previous config saved to /var/cache/conftool/dbconfig/20230522-055120-root.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48415 and previous config saved to /var/cache/conftool/dbconfig/20230522-054304-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48414 and previous config saved to /var/cache/conftool/dbconfig/20230522-054258-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48413 and previous config saved to /var/cache/conftool/dbconfig/20230522-054251-root.json
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2021 T337203', diff saved to https://phabricator.wikimedia.org/P48412 and previous config saved to /var/cache/conftool/dbconfig/20230522-053705-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2020 to es4 codfw primaryT337203', diff saved to https://phabricator.wikimedia.org/P48411 and previous config saved to /var/cache/conftool/dbconfig/20230522-053554-marostegui.json
  • 05:34 marostegui: Starting es4 codfw failover from es2021 to es2020 - T337203
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2020 with weight 0 T337203', diff saved to https://phabricator.wikimedia.org/P48410 and previous config saved to /var/cache/conftool/dbconfig/20230522-052938-root.json
  • 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 T337203
  • 05:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 T337203
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48409 and previous config saved to /var/cache/conftool/dbconfig/20230522-052800-root.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48408 and previous config saved to /var/cache/conftool/dbconfig/20230522-052753-root.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48407 and previous config saved to /var/cache/conftool/dbconfig/20230522-052746-root.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029, es1030, es1031 for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48406 and previous config saved to /var/cache/conftool/dbconfig/20230522-051957-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Failover es1, es2 and es3 masters for kernel reboots', diff saved to https://phabricator.wikimedia.org/P48405 and previous config saved to /var/cache/conftool/dbconfig/20230522-051723-marostegui.json

2023-05-21

  • 07:45 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 07:44 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 07:43 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 07:42 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 07:41 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:40 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply

2023-05-20

  • 18:25 effie: restart varnish cp3061
  • 16:39 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=parse1018.eqiad.wmnet
  • 15:17 hoo@deploy1002: Finished scap: Backport for Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081) (duration: 08m 47s)
  • 15:10 hoo@deploy1002: hoo: Backport for Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:08 hoo@deploy1002: Started scap: Backport for Remove linkitem dependency on jquery.wikibase.wbtooltip (T337081)
  • 14:41 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=parse1018.eqiad.wmnet
  • 09:08 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:08 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
  • 09:07 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added records for the new private.codfw.wikimedia.cloud domain - volans@cumin1001"
  • 09:00 volans@cumin1001: START - Cookbook sre.dns.netbox

2023-05-19

  • 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
  • 21:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
  • 21:19 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1495.eqiad.wmnet
  • 19:46 mutante: mw1469 - sudo pkill ffmpeg (per runbook)
  • 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1469.eqiad.wmnet
  • 19:45 mutante: depooled mw1469 from videoscaler, dedicating to just jobrunner
  • 19:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1469.eqiad.wmnet
  • 19:36 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@b34c529]: (no justification provided) (duration: 00m 09s)
  • 19:36 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@b34c529]: (no justification provided)
  • 16:55 mutante: mw2448 - scap pull - T2334429
  • 15:31 taavi@deploy1002: Finished scap: Backport for i18n: Add link to help page (T322717), Enable RealMe (T324535) (duration: 22m 02s)
  • 15:21 taavi@deploy1002: legoktm and taavi: Backport for i18n: Add link to help page (T322717), Enable RealMe (T324535) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:09 taavi@deploy1002: Started scap: Backport for i18n: Add link to help page (T322717), Enable RealMe (T324535)
  • 15:06 legoktm@deploy1002: Finished scap: Backport for Disable GWToolset from Commons (T270911) (duration: 09m 46s)
  • 15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:59 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
  • 14:58 legoktm@deploy1002: legoktm: Backport for Disable GWToolset from Commons (T270911) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 14:57 legoktm@deploy1002: Started scap: Backport for Disable GWToolset from Commons (T270911)
  • 14:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
  • 14:36 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on stat1009.eqiad.wmnet with reason: Bringing stat1009 into service
  • 14:35 sukhe: enable puppet on A:lvs, finished rolling out change
  • 14:20 sukhe: disable puppet on A:lvs to roll out CR 910566
  • 14:17 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
  • 14:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1014.eqiad.wmnet with reason: firmware update
  • 13:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 10s)
  • 13:34 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
  • 13:34 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
  • 13:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1020.eqiad.wmnet with reason: Move lvs1020 handoff port to row e/f from lsw1-f1 to ssw1-f1
  • 13:26 topranks: Adding vlan config for row e/f vlans on ssw1-f1-eqiad (T322937)
  • 13:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs T330215
  • 12:19 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 11:27 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
  • 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bullseye
  • 10:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast2002
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
  • 10:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast2002 decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
  • 10:45 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
  • 10:44 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
  • 10:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast2002
  • 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bullseye
  • 10:07 moritzm: installing ncurses security updates
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
  • 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:48 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 09:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 09:31 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
  • 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2040-2043].codfw.wmnet
  • 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:21 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
  • 09:21 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • 09:18 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2040-2043].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
  • 09:15 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 09:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
  • 09:02 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
  • 08:59 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2040-2043].codfw.wmnet
  • 08:58 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
  • 08:52 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
  • 08:45 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
  • 08:41 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
  • 08:38 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
  • 08:38 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:34 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
  • 08:31 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
  • 08:27 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
  • 08:18 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
  • 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host netflow2003.codfw.wmnet with OS bookworm
  • 08:11 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
  • 08:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
  • 08:09 moritzm: copy samplicator from bullseye-wikimedia to bookworm-wikimedia T330884
  • 08:03 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
  • 07:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
  • 07:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48397 and previous config saved to /var/cache/conftool/dbconfig/20230519-074256-root.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48396 and previous config saved to /var/cache/conftool/dbconfig/20230519-074044-root.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48395 and previous config saved to /var/cache/conftool/dbconfig/20230519-073959-root.json
  • 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
  • 07:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow2003.codfw.wmnet with reason: host reimage
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48394 and previous config saved to /var/cache/conftool/dbconfig/20230519-072751-root.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48393 and previous config saved to /var/cache/conftool/dbconfig/20230519-072539-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48392 and previous config saved to /var/cache/conftool/dbconfig/20230519-072454-root.json
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus4001.ulsfo.wmnet
  • 07:21 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus4001.ulsfo.wmnet
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48391 and previous config saved to /var/cache/conftool/dbconfig/20230519-071247-root.json
  • 07:11 moritzm: installing emacs security updates
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48390 and previous config saved to /var/cache/conftool/dbconfig/20230519-071034-root.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48389 and previous config saved to /var/cache/conftool/dbconfig/20230519-070949-root.json
  • 06:59 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48388 and previous config saved to /var/cache/conftool/dbconfig/20230519-065742-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48387 and previous config saved to /var/cache/conftool/dbconfig/20230519-065530-root.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48386 and previous config saved to /var/cache/conftool/dbconfig/20230519-065445-root.json
  • 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48385 and previous config saved to /var/cache/conftool/dbconfig/20230519-064237-root.json
  • 06:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48384 and previous config saved to /var/cache/conftool/dbconfig/20230519-064025-root.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48383 and previous config saved to /var/cache/conftool/dbconfig/20230519-063940-root.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48382 and previous config saved to /var/cache/conftool/dbconfig/20230519-062733-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48381 and previous config saved to /var/cache/conftool/dbconfig/20230519-062520-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48380 and previous config saved to /var/cache/conftool/dbconfig/20230519-062435-root.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48379 and previous config saved to /var/cache/conftool/dbconfig/20230519-061228-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48378 and previous config saved to /var/cache/conftool/dbconfig/20230519-061016-root.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48377 and previous config saved to /var/cache/conftool/dbconfig/20230519-060931-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48376 and previous config saved to /var/cache/conftool/dbconfig/20230519-055723-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48375 and previous config saved to /var/cache/conftool/dbconfig/20230519-055511-root.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48374 and previous config saved to /var/cache/conftool/dbconfig/20230519-055426-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2027', diff saved to https://phabricator.wikimedia.org/P48373 and previous config saved to /var/cache/conftool/dbconfig/20230519-054952-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2034 to es3 master', diff saved to https://phabricator.wikimedia.org/P48372 and previous config saved to /var/cache/conftool/dbconfig/20230519-054923-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2031', diff saved to https://phabricator.wikimedia.org/P48371 and previous config saved to /var/cache/conftool/dbconfig/20230519-054758-root.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2033 to es2 master', diff saved to https://phabricator.wikimedia.org/P48370 and previous config saved to /var/cache/conftool/dbconfig/20230519-054737-marostegui.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P48369 and previous config saved to /var/cache/conftool/dbconfig/20230519-054503-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2032 to es1 master', diff saved to https://phabricator.wikimedia.org/P48368 and previous config saved to /var/cache/conftool/dbconfig/20230519-054403-marostegui.json
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1121 from dbctl T336725', diff saved to https://phabricator.wikimedia.org/P48367 and previous config saved to /var/cache/conftool/dbconfig/20230519-053719-marostegui.json

2023-05-18

  • 23:26 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9 refs T330215
  • 22:59 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - T332355
  • 22:21 mutante: contint2001 - moving files owned by zuul to new UID/GID - in progress
  • 22:20 mutante: short down-time for zuul-merger on contint2001
  • 21:47 mutante: maintenance for zuul (CI) on contint servers
  • 21:31 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs T330215
  • 21:13 brennen@deploy1002: Finished scap: Backport for cache: Do not throw on empty set in LinkBatch::constructSet (T336964) (duration: 09m 38s)
  • 21:05 brennen@deploy1002: brennen: Backport for cache: Do not throw on empty set in LinkBatch::constructSet (T336964) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:03 brennen@deploy1002: Started scap: Backport for cache: Do not throw on empty set in LinkBatch::constructSet (T336964)
  • 21:01 urbanecm@deploy1002: Finished scap: Backport for Silently ignore istype-depicts image suggestion type (T336962) (duration: 08m 09s)
  • 20:54 urbanecm@deploy1002: urbanecm: Backport for Silently ignore istype-depicts image suggestion type (T336962) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:53 urbanecm@deploy1002: Started scap: Backport for Silently ignore istype-depicts image suggestion type (T336962)
  • 20:36 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - bking@cumin1001 - T332355
  • 20:33 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - T332355
  • 20:16 urbanecm@deploy1002: Finished scap: Backport for Reverts hewiki A/B test (T335309) (duration: 10m 25s)
  • 20:07 urbanecm@deploy1002: ksarabia and urbanecm: Backport for Reverts hewiki A/B test (T335309) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:06 urbanecm@deploy1002: Started scap: Backport for Reverts hewiki A/B test (T335309)
  • 18:57 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@502ddae]: T333001 (duration: 00m 35s)
  • 18:56 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@502ddae]: T333001
  • 18:55 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - T332355
  • 18:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8 refs T330215
  • 18:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet
  • 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
  • 18:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
  • 18:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
  • 18:19 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add ssw1 irb int dns - cmooney@cumin1001"
  • 18:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.9 refs T330215
  • 18:11 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - bking@cumin1001 - T332355
  • 18:09 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin1001 - T332355
  • 18:07 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - T274204
  • 18:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:59 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin1001 - T274204
  • 17:38 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:37 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:36 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:35 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:29 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:27 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:26 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:26 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:26 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:55 XioNoX: push new pfw policies - T336896
  • 16:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:58 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:57 inflatador: bking@cumin1001 starting rolling restart of wcqs for java updates T334470
  • 15:53 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 15:50 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 15:47 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided) (duration: 00m 10s)
  • 15:47 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@6e3358d]: (no justification provided)
  • 15:37 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
  • 15:37 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:31 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
  • 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
  • 15:25 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 15:23 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
  • 15:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
  • 15:19 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
  • 15:18 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:18 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:17 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
  • 15:15 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
  • 15:09 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
  • 15:08 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
  • 15:04 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
  • 15:03 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided) (duration: 00m 06s)
  • 15:02 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_product@6e3358d]: (no justification provided)
  • 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:56 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts gitlab-runner1003.eqiad.wmnet
  • 14:34 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
  • 14:31 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:31 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:01 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-serve-worker-codfw
  • 13:59 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • 13:52 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
  • 13:50 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
  • 13:49 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
  • 13:47 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
  • 13:18 TheresNoTime: closing backport window
  • 13:14 samtar@deploy1002: Finished scap: Backport for InitialiseSettings: Set wgWatchersMaxAge=30days (T336250) (duration: 08m 45s)
  • 13:07 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:07 samtar@deploy1002: samtar and s-mukuti: Backport for InitialiseSettings: Set wgWatchersMaxAge=30days (T336250) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:06 samtar@deploy1002: Started scap: Backport for InitialiseSettings: Set wgWatchersMaxAge=30days (T336250)
  • 13:02 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:59 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - T332012 (duration: 06m 19s)
  • 12:57 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:54 elukey@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-staging-worker
  • 12:51 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
  • 12:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:46 otto@deploy1002: Synchronized wmf-config/ext-EventLogging.php: Revert Enable First Input Delay events. This is causing validation errors as well as breakages in the hadoop ingestion pipepine - T332012 (duration: 07m 00s)
  • 12:46 elukey: clean up old jupyterhub.service references (crash looping) on stat* nodes that had it
  • 12:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
  • 12:35 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
  • 12:35 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
  • 12:35 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:34 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:28 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
  • 12:24 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:20 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
  • 12:19 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
  • 12:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
  • 12:12 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:11 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet
  • 12:06 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet
  • 12:02 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet
  • 11:56 topranks: reconfiguring DHCP relay function on eqiad core routers (T320508)
  • 11:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 11:51 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
  • 11:36 kart_: MinT: Update to 2023-05-18-060931-production and Set CT2_INTRA_THREADS to 0 (T336483)
  • 11:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 11:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 11:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 11:20 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 11:11 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 11:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 11:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1003.eqiad.wmnet
  • 11:00 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1003.eqiad.wmnet
  • 10:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1002.eqiad.wmnet
  • 10:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1002.eqiad.wmnet
  • 10:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet
  • 10:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk
  • 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on an-worker1110.eqiad.wmnet with reason: Troubleshooting failed disk
  • 10:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
  • 10:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ml-cache1001.eqiad.wmnet
  • 10:24 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
  • 10:06 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 10:05 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 08:30 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:29 akosiaris: upgrade docker-registry to 2.8.2 on all registry hosts
  • 08:28 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:26 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=registry2003.codfw.wmnet
  • 08:24 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 08:24 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 08:19 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 08:19 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 08:00 akosiaris: upgrade registry on registry2003 to 2.8.2
  • 07:59 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=registry2003.codfw.wmnet
  • 07:25 apergos: UTC morning backport and config training window done
  • 07:15 kartik@deploy1002: Finished scap: Backport for Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868) (duration: 09m 18s)
  • 07:07 kartik@deploy1002: kartik: Backport for Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 07:06 kartik@deploy1002: Started scap: Backport for Enable the new Special:Contribute page entry point for desktop on selected wikis (T327868)
  • 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance
  • 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2134,2160].codfw.wmnet,db[1159,1217].eqiad.wmnet with reason: maintenance
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1122 from dbctl T336833', diff saved to https://phabricator.wikimedia.org/P48362 and previous config saved to /var/cache/conftool/dbconfig/20230518-060734-marostegui.json
  • 04:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance
  • 04:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: maintenance

2023-05-17

  • 22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001"
  • 22:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove new openstack.codfw1dev.wikimediacloud.org name server A records. - cmooney@cumin1001"
  • 22:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 22:15 krinkle@deploy1002: Synchronized wmf-config/: T332012 (duration: 06m 51s)
  • 21:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet
  • 21:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 21:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 21:01 zabe: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Public policy" "Global Advocacy" "Zabe" --reason "per request T333842"
  • 20:59 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
  • 20:32 urbanecm: UTC late B&C window done
  • 20:29 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134), NewTopicOptOutActiveUsers: Skip bot users etc. (T317375), Enable zebra ab test in hewiki (T335972) (duration: 11m 36s)
  • 20:19 urbanecm@deploy1002: urbanecm and matmarex and ksarabia and sgimeno: Backport for GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134), NewTopicOptOutActiveUsers: Skip bot users etc. (T317375), Enable zebra ab test in hewiki (T335972) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.
  • 20:17 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: amend wrong wiki prefix for jbowiki (T308134), NewTopicOptOutActiveUsers: Skip bot users etc. (T317375), Enable zebra ab test in hewiki (T335972)
  • 20:15 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: enable add link frontend in 9th round wikis (T308134) (duration: 12m 06s)
  • 20:13 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2012.codfw.wmnet
  • 20:12 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2012.codfw.wmnet
  • 20:07 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2012.codfw.wmnet
  • 20:04 urbanecm@deploy1002: sgimeno and urbanecm: Backport for GrowthExperiments: enable add link frontend in 9th round wikis (T308134) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:03 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: enable add link frontend in 9th round wikis (T308134)
  • 19:55 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 19:54 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 19:54 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2012.codfw.wmnet
  • 19:50 ejegg: payments-wiki upgraded from 8988a598 to a7567c6a
  • 19:41 inflatador: bking@wdqs2012 depooling to attempt firmware update T331297
  • 19:01 Amir1: Removing db1112 from zarcillo T336332
  • 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1112.eqiad.wmnet
  • 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 18:58 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1112.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 18:48 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 18:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1112.eqiad.wmnet
  • 18:34 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.9 refs T330215 (duration: 06m 22s)
  • 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.9 refs T330215
  • 18:11 otto@deploy1002: Finished deploy [analytics/refinery@fb22795]: Deploy for ProduceCanaryEvents fix - [analytics/refinery@fb22795] (duration: 09m 14s)
  • 18:03 brennen: train 1.41.0-wmf.9 (T330215): no current blockers, rolling to group1 as backup-backup conductor
  • 18:02 otto@deploy1002: Started deploy [analytics/refinery@fb22795]: Deploy for ProduceCanaryEvents fix - [analytics/refinery@fb22795]
  • 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:43 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
  • 17:43 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
  • 17:19 brett: Maglev LVS scheduler rollout finished in esams - T263797
  • 16:58 Guest4300: Running `foreachwiki extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --video --mime=video/mpeg --missing --error --stalled --throttle` on mwmaint1002 for T244570
  • 16:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48356 and previous config saved to /var/cache/conftool/dbconfig/20230517-162444-ladsgroup.json
  • 16:21 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48355 and previous config saved to /var/cache/conftool/dbconfig/20230517-161929-ladsgroup.json
  • 16:18 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:17 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:14 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:13 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48354 and previous config saved to /var/cache/conftool/dbconfig/20230517-160937-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48353 and previous config saved to /var/cache/conftool/dbconfig/20230517-160423-ladsgroup.json
  • 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:57 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:56 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P48352 and previous config saved to /var/cache/conftool/dbconfig/20230517-155431-ladsgroup.json
  • 15:52 brett: Rolling out maglev LVS scheduler in esams - T263797
  • 15:52 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:50 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P48351 and previous config saved to /var/cache/conftool/dbconfig/20230517-154916-ladsgroup.json
  • 15:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48350 and previous config saved to /var/cache/conftool/dbconfig/20230517-153925-ladsgroup.json
  • 15:38 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48349 and previous config saved to /var/cache/conftool/dbconfig/20230517-153410-ladsgroup.json
  • 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48348 and previous config saved to /var/cache/conftool/dbconfig/20230517-153042-ladsgroup.json
  • 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 15:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2032 (T335845)', diff saved to https://phabricator.wikimedia.org/P48347 and previous config saved to /var/cache/conftool/dbconfig/20230517-153010-ladsgroup.json
  • 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 (T335845)', diff saved to https://phabricator.wikimedia.org/P48346 and previous config saved to /var/cache/conftool/dbconfig/20230517-153004-ladsgroup.json
  • 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48345 and previous config saved to /var/cache/conftool/dbconfig/20230517-152945-ladsgroup.json
  • 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
  • 15:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
  • 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48344 and previous config saved to /var/cache/conftool/dbconfig/20230517-151458-ladsgroup.json
  • 15:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48343 and previous config saved to /var/cache/conftool/dbconfig/20230517-151438-ladsgroup.json
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
  • 15:07 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027', diff saved to https://phabricator.wikimedia.org/P48342 and previous config saved to /var/cache/conftool/dbconfig/20230517-145952-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P48341 and previous config saved to /var/cache/conftool/dbconfig/20230517-145932-ladsgroup.json
  • 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P{aqs101[6-9]*} and A:aqs
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1027 (T335845)', diff saved to https://phabricator.wikimedia.org/P48340 and previous config saved to /var/cache/conftool/dbconfig/20230517-144446-ladsgroup.json
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48339 and previous config saved to /var/cache/conftool/dbconfig/20230517-144425-ladsgroup.json
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48338 and previous config saved to /var/cache/conftool/dbconfig/20230517-144025-ladsgroup.json
  • 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
  • 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1027 (T335845)', diff saved to https://phabricator.wikimedia.org/P48337 and previous config saved to /var/cache/conftool/dbconfig/20230517-143949-ladsgroup.json
  • 14:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
  • 14:39 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams - EventBus: produce to mediawiki.page_change.v1 stream - T336817 (duration: 06m 20s)
  • 14:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1027.eqiad.wmnet with reason: Maintenance
  • 14:38 btullis@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
  • 14:36 moritzm: installing jackson-databind security updates
  • 14:34 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@ad1cc7c]: deploying hotfix for T336800 (duration: 00m 09s)
  • 14:34 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@ad1cc7c]: deploying hotfix for T336800
  • 14:33 ottomata: EventBus: produce to mediawiki.page_change.v1 stream - T336817
  • 14:30 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 14:30 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 14:28 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 14:28 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 14:27 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 14:27 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 14:27 ottomata: rolling restart of eventgate-main to pick up new mediawiki.page_change.v1 stream config - T336817
  • 14:17 elukey: run authdns-update for new ml-serve/ores discovery endpoints - T336726
  • 14:15 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P{aqs101[6-9]*} and A:aqs
  • 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P{aqs101[2-5]*} and A:aqs
  • 14:14 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - Declare mediawiki.page_change.v1 stream - T336817 (duration: 07m 30s)
  • 14:10 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:09 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:09 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:08 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
  • 13:59 taavi@deploy1002: Finished scap: Backport for Define $maintClass in maintenance script for compatibility (T317375) (duration: 07m 24s)
  • 13:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
  • 13:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
  • 13:54 taavi@deploy1002: matmarex and taavi: Backport for Define $maintClass in maintenance script for compatibility (T317375) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:52 taavi@deploy1002: Started scap: Backport for Define $maintClass in maintenance script for compatibility (T317375)
  • 13:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
  • 13:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
  • 13:47 taavi@deploy1002: Finished scap: Backport for dblists: Close akwiki (T336675) (duration: 08m 11s)
  • 13:42 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P{aqs101[2-5]*} and A:aqs
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P{aqs102[0-1]*} and A:aqs
  • 13:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
  • 13:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
  • 13:40 taavi@deploy1002: taavi and maurelio: Backport for dblists: Close akwiki (T336675) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:38 taavi@deploy1002: Started scap: Backport for dblists: Close akwiki (T336675)
  • 13:38 taavi@deploy1002: Finished scap: Backport for plwiki: Show language selector in main page header (T336707) (duration: 07m 39s)
  • 13:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
  • 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
  • 13:32 taavi@deploy1002: stang and taavi: Backport for plwiki: Show language selector in main page header (T336707) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:30 taavi@deploy1002: Started scap: Backport for plwiki: Show language selector in main page header (T336707)
  • 13:29 taavi@deploy1002: Finished scap: Backport for Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760), Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099) (duration: 09m 15s)
  • 13:25 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P{aqs102[0-1]*} and A:aqs
  • 13:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
  • 13:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet
  • 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on P{aqs1011*} and A:aqs
  • 13:24 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:23 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:23 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:22 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:22 taavi@deploy1002: gtzatchkova and taavi: Backport for Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760), Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:22 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 13:20 taavi@deploy1002: Started scap: Backport for Enable wmgWikibaseTmpWbsubscribersSensibleOutput on wikidata (T336760), Enable wmgWikibaseTmpEnableLabelsInApiSummaries on Wikidata (T335099)
  • 13:20 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:19 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:18 daniel@deploy1002: Finished scap: Backport for Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347), Use MultiHttpClient instead of VirtualRESTService. (T335347) (duration: 11m 52s)
  • 13:17 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on P{aqs1011*} and A:aqs
  • 13:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet
  • 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on A:aqs-canary
  • 13:07 daniel@deploy1002: daniel: Backport for Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347), Use MultiHttpClient instead of VirtualRESTService. (T335347) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:06 daniel@deploy1002: Started scap: Backport for Revert "Revert "Add getMultiHttpClient function to make HTTP requests to Mathoid."" (T335347), Use MultiHttpClient instead of VirtualRESTService. (T335347)
  • 13:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet
  • 13:00 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-canary
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48335 and previous config saved to /var/cache/conftool/dbconfig/20230517-125952-ladsgroup.json
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48334 and previous config saved to /var/cache/conftool/dbconfig/20230517-125824-ladsgroup.json
  • 12:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet
  • 12:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1003.eqiad.wmnet
  • 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:54 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001"
  • 12:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records following puppetdb bulk import - cmooney@cumin1001"
  • 12:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1003.eqiad.wmnet
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48333 and previous config saved to /var/cache/conftool/dbconfig/20230517-124446-ladsgroup.json
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48332 and previous config saved to /var/cache/conftool/dbconfig/20230517-124318-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034', diff saved to https://phabricator.wikimedia.org/P48331 and previous config saved to /var/cache/conftool/dbconfig/20230517-122940-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P48330 and previous config saved to /var/cache/conftool/dbconfig/20230517-122812-ladsgroup.json
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48329 and previous config saved to /var/cache/conftool/dbconfig/20230517-121434-ladsgroup.json
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48328 and previous config saved to /var/cache/conftool/dbconfig/20230517-121306-ladsgroup.json
  • 12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 12:11 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 12:06 topranks: Merging CR822439 and beginning bulk puppetdb -> netbox import to update host interfaces
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48327 and previous config saved to /var/cache/conftool/dbconfig/20230517-115943-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48326 and previous config saved to /var/cache/conftool/dbconfig/20230517-115908-ladsgroup.json
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48325 and previous config saved to /var/cache/conftool/dbconfig/20230517-115612-ladsgroup.json
  • 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
  • 11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48324 and previous config saved to /var/cache/conftool/dbconfig/20230517-115538-ladsgroup.json
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48323 and previous config saved to /var/cache/conftool/dbconfig/20230517-115303-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48322 and previous config saved to /var/cache/conftool/dbconfig/20230517-114402-ladsgroup.json
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48321 and previous config saved to /var/cache/conftool/dbconfig/20230517-114032-ladsgroup.json
  • 11:38 kart_: Update MinT to 2023-05-17-052844-production: Set CT2_USE_EXPERIMENTAL_PACKED_GEMM for better performance
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48320 and previous config saved to /var/cache/conftool/dbconfig/20230517-113757-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48319 and previous config saved to /var/cache/conftool/dbconfig/20230517-113531-ladsgroup.json
  • 11:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P48318 and previous config saved to /var/cache/conftool/dbconfig/20230517-112856-ladsgroup.json
  • 11:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 11:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P48317 and previous config saved to /var/cache/conftool/dbconfig/20230517-112526-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P48316 and previous config saved to /var/cache/conftool/dbconfig/20230517-112251-ladsgroup.json
  • 11:22 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48315 and previous config saved to /var/cache/conftool/dbconfig/20230517-112024-ladsgroup.json
  • 11:15 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48314 and previous config saved to /var/cache/conftool/dbconfig/20230517-111350-ladsgroup.json
  • 11:13 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48313 and previous config saved to /var/cache/conftool/dbconfig/20230517-111020-ladsgroup.json
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48312 and previous config saved to /var/cache/conftool/dbconfig/20230517-110745-ladsgroup.json
  • 11:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 11:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 11:05 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P48311 and previous config saved to /var/cache/conftool/dbconfig/20230517-110518-ladsgroup.json
  • 11:05 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 11:04 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 11:04 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2034 (T335845)', diff saved to https://phabricator.wikimedia.org/P48310 and previous config saved to /var/cache/conftool/dbconfig/20230517-110251-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance
  • 11:02 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 11:01 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 11:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48309 and previous config saved to /var/cache/conftool/dbconfig/20230517-110130-ladsgroup.json
  • 11:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
  • 11:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 11:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
  • 11:00 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 11:00 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1028 (T335845)', diff saved to https://phabricator.wikimedia.org/P48308 and previous config saved to /var/cache/conftool/dbconfig/20230517-105957-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 10:59 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 10:59 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 10:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 10:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 10:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 10:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48307 and previous config saved to /var/cache/conftool/dbconfig/20230517-105012-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2033 (T335845)', diff saved to https://phabricator.wikimedia.org/P48306 and previous config saved to /var/cache/conftool/dbconfig/20230517-104519-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
  • 10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48305 and previous config saved to /var/cache/conftool/dbconfig/20230517-104454-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P48304 and previous config saved to /var/cache/conftool/dbconfig/20230517-103815-ladsgroup.json
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48303 and previous config saved to /var/cache/conftool/dbconfig/20230517-103129-root.json
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48302 and previous config saved to /var/cache/conftool/dbconfig/20230517-102948-ladsgroup.json
  • 10:26 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:25 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P48301 and previous config saved to /var/cache/conftool/dbconfig/20230517-102310-ladsgroup.json
  • 10:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 10:17 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48300 and previous config saved to /var/cache/conftool/dbconfig/20230517-101624-root.json
  • 10:16 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:16 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026', diff saved to https://phabricator.wikimedia.org/P48299 and previous config saved to /var/cache/conftool/dbconfig/20230517-101442-ladsgroup.json
  • 10:09 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:08 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:08 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P48298 and previous config saved to /var/cache/conftool/dbconfig/20230517-100805-ladsgroup.json
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48297 and previous config saved to /var/cache/conftool/dbconfig/20230517-100120-root.json
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48296 and previous config saved to /var/cache/conftool/dbconfig/20230517-095936-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2026 (T335845)', diff saved to https://phabricator.wikimedia.org/P48295 and previous config saved to /var/cache/conftool/dbconfig/20230517-095443-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
  • 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2026.codfw.wmnet with reason: Maintenance
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P48294 and previous config saved to /var/cache/conftool/dbconfig/20230517-095301-ladsgroup.json
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48293 and previous config saved to /var/cache/conftool/dbconfig/20230517-094615-root.json
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2029 (T335845)', diff saved to https://phabricator.wikimedia.org/P48292 and previous config saved to /var/cache/conftool/dbconfig/20230517-093928-ladsgroup.json
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
  • 09:39 elukey: roll restart pybal on lvs2010, lvs2009, lvs1020, lvs1019 to pick up a VIP (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/920219) - T336726
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48291 and previous config saved to /var/cache/conftool/dbconfig/20230517-093110-root.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48290 and previous config saved to /var/cache/conftool/dbconfig/20230517-091606-root.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1220 cleaning gtid_domain_id', diff saved to https://phabricator.wikimedia.org/P48289 and previous config saved to /var/cache/conftool/dbconfig/20230517-091407-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48288 and previous config saved to /var/cache/conftool/dbconfig/20230517-085855-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48287 and previous config saved to /var/cache/conftool/dbconfig/20230517-084350-root.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48285 and previous config saved to /var/cache/conftool/dbconfig/20230517-082846-root.json
  • 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48284 and previous config saved to /var/cache/conftool/dbconfig/20230517-081341-root.json
  • 08:08 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:05 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:04 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48283 and previous config saved to /var/cache/conftool/dbconfig/20230517-075836-root.json
  • 07:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:48 moritzm: upgrading krb1001 to Bullseye T331695
  • 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye
  • 07:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb1001.eqiad.wmnet with reason: Update to Bullseye
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48278 and previous config saved to /var/cache/conftool/dbconfig/20230517-074332-root.json
  • 07:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 37468
  • 07:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'clear' for AS: 37468
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 4%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48277 and previous config saved to /var/cache/conftool/dbconfig/20230517-072827-root.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 for decommissioning', diff saved to https://phabricator.wikimedia.org/P48276 and previous config saved to /var/cache/conftool/dbconfig/20230517-072508-root.json
  • 07:19 kartik@deploy1002: Finished scap: Backport for Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis" (duration: 07m 22s)
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48275 and previous config saved to /var/cache/conftool/dbconfig/20230517-071428-root.json
  • 07:13 kartik@deploy1002: trainbranchbot and kartik: Backport for Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 3%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48274 and previous config saved to /var/cache/conftool/dbconfig/20230517-071322-root.json
  • 07:11 kartik@deploy1002: Started scap: Backport for Revert "Enable the new Special:Contribute page entry point for desktop on selected wikis"
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 T336725', diff saved to https://phabricator.wikimedia.org/P48273 and previous config saved to /var/cache/conftool/dbconfig/20230517-071039-root.json
  • 07:09 kartik@deploy1002: Backport cancelled.
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48272 and previous config saved to /var/cache/conftool/dbconfig/20230517-065923-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48271 and previous config saved to /var/cache/conftool/dbconfig/20230517-065817-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48270 and previous config saved to /var/cache/conftool/dbconfig/20230517-064419-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling after a crash', diff saved to https://phabricator.wikimedia.org/P48269 and previous config saved to /var/cache/conftool/dbconfig/20230517-064313-root.json
  • 06:40 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 06:39 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 06:39 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 06:38 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 06:37 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 06:37 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48268 and previous config saved to /var/cache/conftool/dbconfig/20230517-062914-root.json
  • 06:22 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 06:21 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 06:20 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 06:20 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 06:19 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 06:18 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48267 and previous config saved to /var/cache/conftool/dbconfig/20230517-061409-root.json
  • 06:01 volans: restarted ferm on ms-be1047
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48265 and previous config saved to /var/cache/conftool/dbconfig/20230517-055904-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096', diff saved to https://phabricator.wikimedia.org/P48264 and previous config saved to /var/cache/conftool/dbconfig/20230517-055310-root.json
  • 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1115.eqiad.wmnet
  • 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 05:48 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1115.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 05:46 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1115.eqiad.wmnet
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1112 from dbctl T336332', diff saved to https://phabricator.wikimedia.org/P48263 and previous config saved to /var/cache/conftool/dbconfig/20230517-052007-marostegui.json
  • 05:16 marostegui: Optimize s7 on dbstore1003 T336733
  • 00:21 krinkle@deploy1002: Synchronized src/: I4cfa4a2474b4e (duration: 06m 01s)
  • 00:15 krinkle@deploy1002: Synchronized wmf-config/: I4cfa4a2474b4e (duration: 06m 14s)
  • 00:07 krinkle@deploy1002: Synchronized lib/: I4cfa4a2474b4e (duration: 06m 51s)

2023-05-16

  • 20:59 jdrewniak@deploy1002: Finished scap: Backport for Add maint script to opt out active users from the new topic tool (T317375) (duration: 07m 18s)
  • 20:53 jdrewniak@deploy1002: jdrewniak and matmarex: Backport for Add maint script to opt out active users from the new topic tool (T317375) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:52 jdrewniak@deploy1002: Started scap: Backport for Add maint script to opt out active users from the new topic tool (T317375)
  • 20:49 volans@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 20:49 jdrewniak@deploy1002: Finished scap: Backport for Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641) (duration: 09m 19s)
  • 20:41 jdrewniak@deploy1002: jdrewniak: Backport for Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:39 jdrewniak@deploy1002: Started scap: Backport for Consolidate watchstar icon updating logic under watchstar.js (T336640 T336641)
  • 20:36 jdrewniak@deploy1002: Finished scap: Backport for Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641) (duration: 07m 44s)
  • 20:30 jdrewniak@deploy1002: jdrewniak: Backport for Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:30 brett: Rolling out maglev LVS scheduler in drmrs (for real this time) - T263797
  • 20:29 jdrewniak@deploy1002: Started scap: Backport for Ensure mw-watchlink is used for the sticky header watchlink (T336640 T336641)
  • 19:13 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:13 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
  • 19:12 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
  • 19:10 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 19:10 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 19:04 sukhe: dummry run of authdns-update to confirm new hosts
  • 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2003.wikimedia.org
  • 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:00 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:59 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:57 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 18:54 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 18:52 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2003.wikimedia.org
  • 18:50 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2022.*
  • 18:50 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
  • 18:50 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 18:50 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:50 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin2002"
  • 18:49 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin2002"
  • 18:47 ryankemper: [WDQS] Pooled `wdqs2012`
  • 18:46 ryankemper: [WDQS] Pooled `wdqs2006` (not sure why was depooled)
  • 18:46 sukhe: homer "cr*-codfw*" commit "Gerrit: 920363 remove to-be decommissioned host dns2003": T335777
  • 18:46 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 18:43 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:43 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
  • 18:42 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin2002"
  • 18:41 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 18:41 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 18:36 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.74 208.80.153.107 ]: T326688
  • 18:34 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.9 refs T330215
  • 18:28 sukhe: homer "cr*-codfw*" commit "Gerrit: 920358 add new DNS host dns2006": T326688
  • 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2006.wikimedia.org with OS bullseye
  • 18:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
  • 18:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
  • 18:01 sukhe: enable puppet on A:cp-text
  • 17:58 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 17:57 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 17:56 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 17:55 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 17:52 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 17:52 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 17:47 volans@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 17:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:47 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin1001"
  • 17:46 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a8-codfw - volans@cumin1001"
  • 17:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bullseye
  • 17:44 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:40 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:40 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin1001"
  • 17:40 moritzm: installing avahi security updates on buster
  • 17:39 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a8-codfw - volans@cumin1001"
  • 17:37 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:37 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a8-codfw.mgmt.codfw.wmnet
  • 17:34 joal@deploy1002: Finished deploy [airflow-dags/analytics@7816937]: Regular analytics weekly train - Hotfix [airflow-dags@7816937] (duration: 00m 10s)
  • 17:34 joal@deploy1002: Started deploy [airflow-dags/analytics@7816937]: Regular analytics weekly train - Hotfix [airflow-dags@7816937]
  • 17:27 volans@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:27 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin1001"
  • 17:27 brett: Rolling out maglev LVS scheduler in drmrs - T263797
  • 17:26 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin1001"
  • 17:24 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:20 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001"
  • 17:19 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin1001"
  • 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns2002.wikimedia.org
  • 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:17 volans@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:14 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:09 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2002.wikimedia.org
  • 17:00 sukhe: homer "cr*-codfw*" commit "Gerrit: 920320 remove to-be decommissioned host dns2002" T335777
  • 16:59 moritzm: installing 5.10.179 kernels on Bullseye hosts
  • 16:55 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
  • 16:30 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:30 volans: restarting wikibugs ( https://www.mediawiki.org/wiki/Wikibugs#Help )
  • 16:06 mutante: gitlab-runner2003 - installed rsync client for debugging an issue with rsync from inside containers, comparing to from outside container
  • 15:49 sukhe: run authdns-update for CR 920314
  • 15:41 joal@deploy1002: Finished deploy [airflow-dags/analytics@7fa2dcd]: Regular analytics weekly train [airflow-dags@7fa2dcd] (duration: 00m 10s)
  • 15:41 joal@deploy1002: Started deploy [airflow-dags/analytics@7fa2dcd]: Regular analytics weekly train [airflow-dags@7fa2dcd]
  • 15:36 hashar: Some CI jobs started failing after an upgrade of some Jenkins plugins. I have upgraded a couple more and it seems to work now T336775
  • 15:33 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.10 208.80.153.48 208.80.153.74 ]: T326688
  • 15:33 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.10 208.80.153.48 208.80.153.74 ]
  • 15:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:32 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:27 hashar: Restarting CI Jenkins
  • 15:26 Emperor: rebalance codfw swift rings T335280
  • 15:18 hashar: CI Jenkins jobs are stall following the plugins upgrade :/
  • 15:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:03 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 14:55 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:49 moritzm: installing libxml2 security updates on buster
  • 14:48 sukhe: [done] "cr*-codfw*" commit "Gerrit: 919876 add new DNS host dns2005": T326688
  • 14:47 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:46 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:43 hashar: Restarting CI Jenkins
  • 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:42 sukhe: "cr*-codfw*" commit "Gerrit: 919876 add new DNS host dns2005": T326688
  • 14:36 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:32 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:31 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 14:30 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 14:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:27 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:26 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:26 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:26 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2005.wikimedia.org with OS bullseye
  • 14:18 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) (duration: 00m 45s)
  • 14:17 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
  • 14:10 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in codfw: codfw row D switches upgrade done - T335042
  • 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
  • 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
  • 13:54 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: codfw row D switches upgrade done - T335042
  • 13:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
  • 13:49 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-eqiad
  • 13:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 13:46 Emperor: repool ms-fe2012 T335042
  • 13:45 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-eqiad
  • 13:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=eventschemas,dc=codfw,name=schema2004.codfw.wmnet
  • 13:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=eventschemas,dc=codfw,name=schema2004.eqiad.wmnet
  • 13:33 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfw.wmnet,service=thanos-web
  • 13:33 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfwm.wmnet,service=thanos-web
  • 13:32 taavi@deploy1002: Finished scap: Backport for Add stream config for mobile apps schema (T336508) (duration: 09m 08s)
  • 13:32 Emperor: repool thanos-fe2003 T335042
  • 13:30 sukhe: running authdns-update to repool codfw
  • 13:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org
  • 13:25 taavi@deploy1002: mazevedo and taavi: Backport for Add stream config for mobile apps schema (T336508) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:25 moritzm: enabled Puppet in codfw/esams/ulsfo for switch maintenance T335042
  • 13:23 taavi@deploy1002: Started scap: Backport for Add stream config for mobile apps schema (T336508)
  • 13:01 XioNoX: asw-d-codfw> request system reboot all-members - T335042
  • 12:52 Emperor: depool ms-fe2012 T335042
  • 12:51 Emperor: depool thanos-fe2003 T335042
  • 12:50 moritzm: disabling Puppet in codfw/esams/ulsfo for switch maintenance T335042
  • 12:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 189 hosts with reason: codfw row D upgrade
  • 12:46 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 189 hosts with reason: codfw row D upgrade
  • 12:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
  • 12:39 akosiaris: reboot rdb1009 for kernel upgrades: possibly affected apps: netbox, changeprop, cpjobqueue, api-gateway, redisLockManager. Should be harmless however
  • 12:39 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
  • 12:35 godog: start cadvisor 0.44 upgrade to buster hosts - T336740
  • 12:29 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2a0b1f2] (duration: 01m 30s)
  • 12:28 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2a0b1f2]
  • 12:27 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] (duration: 00m 04s)
  • 12:27 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2]
  • 12:24 sukhe: [done] running authdns-update to disable codfw for switch upgrade: T335042
  • 12:22 sukhe: running authdns-update to disable codfw for switch upgrade: T335042
  • 12:21 XioNoX: disable ping offload in codfw - T335042
  • 12:20 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 12:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 12:15 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2] (duration: 00m 10s)
  • 12:15 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2] (thin): Regular analytics weekly train THIN [analytics/refinery@2a0b1f2]
  • 12:09 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 12:06 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 12:04 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 12:02 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 11:59 kart_: Updated cxserver to 2023-05-16-061239-production (T336657)
  • 11:57 XioNoX: stage upgrade on asw-d-codfw - T335042
  • 11:56 joal@deploy1002: Finished deploy [analytics/refinery@2a0b1f2]: Regular analytics weekly train [analytics/refinery@2a0b1f2] (duration: 10m 45s)
  • 11:56 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:55 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:55 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:55 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:53 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:52 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:51 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-codfw
  • 11:50 marostegui: install 10.4.29 on db1151 T336462
  • 11:50 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:49 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:47 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-codfw
  • 11:46 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:46 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:45 joal@deploy1002: Started deploy [analytics/refinery@2a0b1f2]: Regular analytics weekly train [analytics/refinery@2a0b1f2]
  • 11:44 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:30 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2002.codfw.wmnet with OS bookworm
  • 11:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
  • 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 14 hosts with reason: maintenance
  • 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 14 hosts with reason: maintenance
  • 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 11 hosts with reason: maintenance
  • 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 11 hosts with reason: maintenance
  • 11:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: maintenance
  • 11:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 13 hosts with reason: maintenance
  • 11:20 akosiaris: reboot rdb2007 for kernel upgrades: possibly affected apps: netbox, changeprop, cpjobqueue, api-gateway, redisLockManager. Should be harmless however
  • 11:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bookworm
  • 11:17 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2004.codfw.wmnet with OS bookworm
  • 11:16 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
  • 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bookworm
  • 11:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
  • 11:00 moritzm: updated bookworm image to RC3 T330495
  • 10:59 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
  • 10:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
  • 10:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
  • 10:52 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:52 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
  • 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
  • 10:51 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
  • 10:50 akosiaris@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
  • 10:50 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:49 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
  • 10:48 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
  • 10:48 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
  • 10:48 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
  • 10:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) depool all active/active services in codfw: codfw row D switches upgrade - T335042
  • 10:43 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab-runner1003.eqiad.wmnet
  • 10:40 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 10:39 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 10:39 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 10:38 jayme@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 10:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:36 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:35 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade
  • 10:34 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc-wf[2001-2002].codfw.wmnet,mc-wf[1001-1002].eqiad.wmnet with reason: kernel upgrade
  • 10:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001"
  • 10:33 vgutierrez: testing HAProxy 2.7.8 in cp4052 and cp5032 (upload) - T317799
  • 10:33 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new VIP records for k8s-ingress-ml-serve - elukey@cumin1001"
  • 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:29 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in codfw: codfw row D switches upgrade - T335042
  • 10:28 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 10:13 Amir1: cleaning up echo notification table in all wikis (T318523)
  • 10:07 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:06 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:49 btullis@deploy1002: Finished deploy [airflow-dags/analytics_product@7642b62]: (no justification provided) (duration: 00m 09s)
  • 09:49 btullis@deploy1002: Started deploy [airflow-dags/analytics_product@7642b62]: (no justification provided)
  • 09:38 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
  • 09:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
  • 09:25 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
  • 09:23 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.reboot-runner (exit_code=1) rolling reboot on A:gitlab-runner
  • 09:23 jnuche@deploy1002: Installing scap version "4.52.2" for 595 hosts
  • 09:21 marostegui: Optimize s5 on dbstore1003 T336733
  • 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es2034.codfw.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es2034.codfw.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es2033.codfw.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es2033.codfw.wmnet with reason: Maintenance
  • 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on es[2023-2025].codfw.wmnet with reason: maintenance
  • 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7:00:00 on es[2023-2025].codfw.wmnet with reason: maintenance
  • 08:18 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2006.wikimedia.org
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance
  • 08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2004.codfw.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy2003.codfw.wmnet with reason: Maintenance
  • 07:52 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 07:28 Emperor: restart vopsbot.service on alert1001
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48254 and previous config saved to /var/cache/conftool/dbconfig/20230516-071509-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48253 and previous config saved to /var/cache/conftool/dbconfig/20230516-071453-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48252 and previous config saved to /var/cache/conftool/dbconfig/20230516-070005-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48251 and previous config saved to /var/cache/conftool/dbconfig/20230516-065948-root.json
  • 06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 06:56 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" (duration: 06m 58s)
  • 06:51 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 06:50 eileen: civicrm: revision d97a371e, config 686d3cb4
  • 06:49 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master"
  • 06:49 _joe_: running docker image prune -a in build2001
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48250 and previous config saved to /var/cache/conftool/dbconfig/20230516-064500-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48249 and previous config saved to /var/cache/conftool/dbconfig/20230516-064444-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48248 and previous config saved to /var/cache/conftool/dbconfig/20230516-062955-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48247 and previous config saved to /var/cache/conftool/dbconfig/20230516-062939-root.json
  • 06:24 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master (duration: 07m 08s)
  • 06:24 eileen: civicrm upgraded from ef7b3822 to d97a371e
  • 06:18 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc3 master synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 06:17 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48246 and previous config saved to /var/cache/conftool/dbconfig/20230516-061450-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48245 and previous config saved to /var/cache/conftool/dbconfig/20230516-061434-root.json
  • 06:05 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Failover pc3 codfw host" (duration: 07m 21s)
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48244 and previous config saved to /var/cache/conftool/dbconfig/20230516-055946-root.json
  • 05:59 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Failover pc3 codfw host" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48243 and previous config saved to /var/cache/conftool/dbconfig/20230516-055929-root.json
  • 05:58 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Failover pc3 codfw host"
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 T336332', diff saved to https://phabricator.wikimedia.org/P48242 and previous config saved to /var/cache/conftool/dbconfig/20230516-055122-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48241 and previous config saved to /var/cache/conftool/dbconfig/20230516-054441-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48240 and previous config saved to /var/cache/conftool/dbconfig/20230516-054425-root.json
  • 05:43 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Failover pc3 codfw host (duration: 07m 15s)
  • 05:38 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Failover pc3 codfw host synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 05:36 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Failover pc3 codfw host
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48239 and previous config saved to /var/cache/conftool/dbconfig/20230516-052936-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48238 and previous config saved to /var/cache/conftool/dbconfig/20230516-052920-root.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1221 T336337', diff saved to https://phabricator.wikimedia.org/P48237 and previous config saved to /var/cache/conftool/dbconfig/20230516-052026-root.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 T336337', diff saved to https://phabricator.wikimedia.org/P48236 and previous config saved to /var/cache/conftool/dbconfig/20230516-052014-root.json
  • 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.6, 1.41.0-wmf.7 (duration: 02m 26s)
  • 03:51 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.9 refs T330215 (duration: 48m 47s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.9 refs T330215

2023-05-15

  • 23:37 eileen: civicrm upgraded from db6e8d69 to ef7b3822
  • 22:02 maryum: deployed patch for T323651
  • 21:51 maryum: Deployed patch for T335612
  • 21:42 ejegg: payments-wiki upgraded from c0da741f to 8988a598 (and globalcollect settings deleted)
  • 20:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
  • 19:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
  • 19:50 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086]* for row D switch upgrade - bking@cumin1001 - T335042
  • 19:50 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086]* for row D switch upgrade - bking@cumin1001 - T335042
  • 19:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
  • 19:49 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086] for row D switch upgrade - bking@cumin1001 - T335042
  • 19:49 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2050-2054,2060,2067-2068,2072,2084-2086] for row D switch upgrade - bking@cumin1001 - T335042
  • 19:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
  • 19:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 2:00:00 on 20 hosts with reason: T335042 maintenance
  • 19:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 2:00:00 on 20 hosts with reason: T335042 maintenance
  • 19:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
  • 19:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
  • 19:33 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
  • 19:32 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
  • 19:28 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5] (wcqs): deploy 0.3.124 to WCQS (duration: 02m 03s)
  • 19:26 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5] (wcqs): deploy 0.3.124 to WCQS
  • 19:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
  • 19:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
  • 19:19 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 00m 05s)
  • 19:19 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided)
  • 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 00m 05s)
  • 19:18 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided)
  • 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: (no justification provided) (duration: 05m 46s)
  • 19:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
  • 19:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
  • 19:12 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: (no justification provided)
  • 19:12 bking@deploy1002: Finished deploy [wdqs/wdqs@41174d5]: 0.3.124 (duration: 10m 05s)
  • 19:03 inflatador: [WDQS Deploy] Tests passing following deploy of `0.3.124` on canary `wdqs1003`; proceeding to rest of fleet
  • 19:02 bking@deploy1002: Started deploy [wdqs/wdqs@41174d5]: 0.3.124
  • 18:54 mutante: LDAP - added uid 'adee' to groups wmde and nda - T336434
  • 18:54 sukhe: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.10 ]: codfw row D maint 2023/05/16 [dns2002] T335042
  • 18:33 brett: Rolling out maglev LVS scheduler in eqsin - T263797
  • 18:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2005.wikimedia.org with OS bullseye
  • 18:11 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
  • 18:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2005.wikimedia.org with OS bullseye
  • 18:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
  • 17:47 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:47 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:47 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:46 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:42 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:42 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:42 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:41 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:39 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:39 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:30 volans@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:30 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:30 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:29 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:27 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:27 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:26 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - volans@cumin2002"
  • 17:15 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:15 volans@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 15:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
  • 15:00 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
  • 14:24 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: testing transferpy cookbook
  • 14:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: testing transferpy cookbook
  • 14:21 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 14:20 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:20 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:17 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:16 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:03 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 14:00 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 13:56 volans: re-enabled puppet on the install hosts to deploy changes for T336485
  • 13:45 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 13:33 volans: disabling puppet on the install hosts to deploy changes for T336485
  • 13:00 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:00 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:58 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:58 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 (T335845)', diff saved to https://phabricator.wikimedia.org/P48228 and previous config saved to /var/cache/conftool/dbconfig/20230515-111624-ladsgroup.json
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P48227 and previous config saved to /var/cache/conftool/dbconfig/20230515-110118-ladsgroup.json
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023', diff saved to https://phabricator.wikimedia.org/P48226 and previous config saved to /var/cache/conftool/dbconfig/20230515-104611-ladsgroup.json
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1023 (T335845)', diff saved to https://phabricator.wikimedia.org/P48225 and previous config saved to /var/cache/conftool/dbconfig/20230515-103105-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1023 (T335845)', diff saved to https://phabricator.wikimedia.org/P48224 and previous config saved to /var/cache/conftool/dbconfig/20230515-102038-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
  • 10:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
  • 10:19 Amir1: Removing db1123 from zarcillo T334910
  • 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1123.eqiad.wmnet
  • 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1123.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48223 and previous config saved to /var/cache/conftool/dbconfig/20230515-101329-ladsgroup.json
  • 10:13 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1123.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 10:11 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 10:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1123.eqiad.wmnet
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020', diff saved to https://phabricator.wikimedia.org/P48222 and previous config saved to /var/cache/conftool/dbconfig/20230515-095823-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Remove db1123 from dbctl T334910', diff saved to https://phabricator.wikimedia.org/P48221 and previous config saved to /var/cache/conftool/dbconfig/20230515-095412-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1123 T334910', diff saved to https://phabricator.wikimedia.org/P48220 and previous config saved to /var/cache/conftool/dbconfig/20230515-094938-ladsgroup.json
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020', diff saved to https://phabricator.wikimedia.org/P48219 and previous config saved to /var/cache/conftool/dbconfig/20230515-094317-ladsgroup.json
  • 09:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15802
  • 09:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15802
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48218 and previous config saved to /var/cache/conftool/dbconfig/20230515-092810-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48217 and previous config saved to /var/cache/conftool/dbconfig/20230515-091139-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
  • 09:08 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 09:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 08:45 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:45 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:26 elukey: restart pybal on lvs2010 and lvs2009 to pick up new LVS VIP for ml-staging k8s ingress - T335756
  • 08:26 volans: installed spicerack_7.1.0 on cumin1001
  • 08:22 volans: installed spicerack_7.1.0 on cumin2002
  • 08:08 volans: uploaded spicerack_7.1.0 to apt.wikimedia.org bullseye-wikimedia
  • 05:36 _joe_: building bookworm image for the first time T335560

2023-05-12

  • 22:59 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 22:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update cloudswift ip address - pt1979@cumin2002"
  • 22:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update cloudswift ip address - pt1979@cumin2002"
  • 22:32 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 22:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 21:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS buster
  • 21:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS buster
  • 20:08 mutante: gerrit1001 - systemctl mask gerrit T326368
  • 18:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 18:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 18:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 18:08 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 18:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudswift1001']
  • 17:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudswift1001']
  • 17:59 sukhe: running authdns-update for CR 919388
  • 17:31 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 150m 34s)
  • 17:27 sukhe: set routing-options static route 208.80.153.240/28 [high-traffic2, codfw] next-hop 10.192.16.140: T326767
  • 17:21 sukhe: restart pybal on lvs2012 to pick up bgp med change: T326767
  • 17:11 sukhe: homer "cr*-codfw*" commit "Gerrit: 917924 add new LVS host lvs2012": T326767
  • 17:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 16:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2012.codfw.wmnet with OS bullseye
  • 16:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 16:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 16:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on lists1003.wikimedia.org with reason: maintenance
  • 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on lists1003.wikimedia.org with reason: maintenance
  • 16:23 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 16:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:08 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 16:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:00 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 15:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 15:54 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2012.codfw.wmnet with OS bullseye
  • 15:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 15:09 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 15:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 15:02 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dns2001.wikimedia.org
  • 15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:01 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
  • 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2001.wikimedia.org
  • 14:39 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:38 cdanis: silencing jobrunner/videoscaler probes for the weekend -- silence ID 21903b52-047b-43d9-94be-908a4b92b5a7
  • 14:38 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:36 cdanis: silencing jobrunner/videoscaler probes for the weekend
  • 14:35 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dns2001.wikimedia.wmnet
  • 14:35 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:35 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2001.wikimedia.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:34 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns2001.wikimedia.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:29 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:24 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns2001.wikimedia.wmnet
  • 14:15 sukhe: [done] homer "cr*-codfw*" commit "Gerrit: 917364 remove to-be decommissioned host dns2001": T335777
  • 14:13 sukhe: homer "cr*-codfw*" commit "Gerrit: 917364 remove to-be decommissioned host dns2001": T335777
  • 13:54 sukhe: enable puppet and run agent in A:dns-rec: done deploying CR 919067
  • 13:38 sukhe: disable puppet on A:dns-rec to merge CR 919067
  • 13:22 sukhe: sudo cumin -b1 -s1200 'A:cp and A:eqiad' 'varnish-frontend-restart': T253093
  • 13:06 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:06 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 12:46 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:45 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 12:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:26 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 11:58 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 11:56 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48215 and previous config saved to /var/cache/conftool/dbconfig/20230512-113514-root.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48213 and previous config saved to /var/cache/conftool/dbconfig/20230512-112010-root.json
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48212 and previous config saved to /var/cache/conftool/dbconfig/20230512-110505-root.json
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48211 and previous config saved to /var/cache/conftool/dbconfig/20230512-105000-root.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48210 and previous config saved to /var/cache/conftool/dbconfig/20230512-103455-root.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48209 and previous config saved to /var/cache/conftool/dbconfig/20230512-101950-root.json
  • 10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2184.codfw.wmnet with reason: Maintenance
  • 10:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2184.codfw.wmnet with reason: Maintenance
  • 10:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48208 and previous config saved to /var/cache/conftool/dbconfig/20230512-100446-root.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48206 and previous config saved to /var/cache/conftool/dbconfig/20230512-094941-root.json
  • 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P48205 and previous config saved to /var/cache/conftool/dbconfig/20230512-093950-root.json
  • 09:18 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8 refs T330214
  • 09:13 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1494.eqiad.wmnet
  • 09:13 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw146[7-9].eqiad.wmnet
  • 09:08 hashar@deploy1002: Finished scap: Backport for Reset the cached skin in RequestContext::setUser() (T336504) (duration: 16m 27s)
  • 08:54 hashar@deploy1002: hashar: Backport for Reset the cached skin in RequestContext::setUser() (T336504) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:52 hashar@deploy1002: Started scap: Backport for Reset the cached skin in RequestContext::setUser() (T336504)
  • 08:03 _joe_: restarting envoy on all jobrunners pooled in the jobrunner cluster T336554
  • 08:00 _joe_: do it also on mw1438
  • 07:59 _joe_: restaring envoyproxy on mw1439 to rebalance connections (see T336554)
  • 07:57 taavi@deploy1002: Finished scap: Backport for Disable Graph (again) (T336556) (duration: 12m 29s)
  • 07:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
  • 07:46 taavi@deploy1002: taavi: Backport for Disable Graph (again) (T336556) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
  • 07:45 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 07:44 taavi@deploy1002: Started scap: Backport for Disable Graph (again) (T336556)
  • 07:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 20940
  • 07:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 20940
  • 07:28 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 07:27 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
  • 07:27 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 05:33 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1495.eqiad.wmnet
  • 05:32 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1466.eqiad.wmnet
  • 05:32 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1458.eqiad.wmnet
  • 05:31 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=eqiad,name=mw1461.eqiad.wmnet
  • 02:33 ejegg: payments-wiki upgraded from d1c5fefc to c0da741f
  • 02:32 ejegg: SmashPig upgraded from a9fa7a2c to 5460dbe2
  • 01:08 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus6001.drmrs.wmnet
  • 01:08 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:08 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus6001.drmrs.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 01:07 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus6001.drmrs.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 01:01 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 00:57 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus6001.drmrs.wmnet
  • 00:51 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus5001.eqsin.wmnet
  • 00:51 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:51 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 00:50 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 00:48 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 00:44 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus5001.eqsin.wmnet
  • 00:32 denisse: manually removing prometheus4001.ulsfo.wmnet from the Ganeti master after a failed step in the decommission cookbook - T335585
  • 00:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on prometheus3001.esams.wmnet with reason: maintenance
  • 00:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on prometheus3001.esams.wmnet with reason: maintenance

2023-05-11

  • 23:39 mutante: LDAP - added uid lorenjohnson to groups wmde nda T335858
  • 23:39 mutante: LDAP - added uid roti to groups wmde and nda T336435
  • 23:24 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw14(3[789]|4[056]|57)\.eqiad\.wmnet
  • 23:22 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw14(5[89]|6[016789]|9[45])\.eqiad\.wmnet
  • 23:22 rzl@cumin2002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw14(3[789]|4[056]57)\.eqiad\.wmnet
  • 23:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudswift1002']
  • 22:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudswift1002']
  • 22:41 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudswift1001']
  • 22:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudswift1001']
  • 21:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudswift1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:10 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
  • 21:07 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
  • 21:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts db1225.eqiad.wmnet
  • 21:07 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
  • 21:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1225.eqiad.wmnet
  • 21:05 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
  • 20:58 urbanecm@deploy1002: Finished scap: Backport for Personalized praise: Do not suggest users with Homepage disabled (T336300), Personalized praise: Do not suggest users with Homepage disabled (T336300) (duration: 07m 30s)
  • 20:52 urbanecm@deploy1002: urbanecm: Backport for Personalized praise: Do not suggest users with Homepage disabled (T336300), Personalized praise: Do not suggest users with Homepage disabled (T336300) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:51 urbanecm@deploy1002: Started scap: Backport for Personalized praise: Do not suggest users with Homepage disabled (T336300), Personalized praise: Do not suggest users with Homepage disabled (T336300)
  • 20:50 urbanecm@deploy1002: Finished scap: Backport for [Growth] Remove config variables provided by extension (duration: 20m 04s)
  • 20:37 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus4001.ulsfo.wment
  • 20:37 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:36 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 20:32 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus4001.ulsfo.wment
  • 20:31 urbanecm@deploy1002: urbanecm: Backport for [Growth] Remove config variables provided by extension synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:30 urbanecm@deploy1002: Started scap: Backport for [Growth] Remove config variables provided by extension
  • 20:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudswift1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:22 thcipriani@deploy1002: Finished scap: Backport for Allow http://localhost callback URL (T299737) (duration: 09m 37s)
  • 20:22 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus3001.esams.wment
  • 20:22 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:21 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus3001.esams.wment decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 20:20 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus3001.esams.wment decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 20:18 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 20:17 denisse: manually remove prometheus3001.esams.wmnet from the ganeti master after a failed step in the decommission cookbook.
  • 20:14 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus3001.esams.wment
  • 20:14 thcipriani@deploy1002: bd808 and thcipriani: Backport for Allow http://localhost callback URL (T299737) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:12 thcipriani@deploy1002: Started scap: Backport for Allow http://localhost callback URL (T299737)
  • 19:56 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus3001.esams.wment
  • 19:56 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:55 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 19:51 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus3001.esams.wment
  • 19:06 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 19:06 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 18:46 ejegg: civicrm upgraded from d8a1a562 to db6e8d69
  • 17:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-airflow1006.eqiad.wmnet with reason: Silence error notifications/alerts during setup
  • 17:46 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-airflow1006.eqiad.wmnet with reason: Silence error notifications/alerts during setup
  • 17:24 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 17:12 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.8 refs T330214 (duration: 06m 14s)
  • 17:12 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 17:11 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:10 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 17:08 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 17:06 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:06 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 17:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.8 refs T330214
  • 17:05 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 17:01 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:00 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:58 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:58 bking@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:57 bking@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:56 bking@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:56 bking@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:56 bking@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:56 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:56 bking@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:55 bking@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:54 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:50 hashar: CI / Zuul was slow to report build results back to Gerrit most probably due to lack of IPv6 (T336524) which should be solved now.
  • 16:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48203 and previous config saved to /var/cache/conftool/dbconfig/20230511-164125-ladsgroup.json
  • 16:37 brennen: train 1.41.0-wmf.8 (T330214): rolling back to group1 to test for T336504 presence/absence on enwiki
  • 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020', diff saved to https://phabricator.wikimedia.org/P48201 and previous config saved to /var/cache/conftool/dbconfig/20230511-162619-ladsgroup.json
  • 16:16 elukey: benthos webrequest live instances migrated to kafka-franz (new consumer client, data may have some holes) - T331801
  • 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020', diff saved to https://phabricator.wikimedia.org/P48200 and previous config saved to /var/cache/conftool/dbconfig/20230511-161113-ladsgroup.json
  • 16:08 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2002.wikimedia.org with OS bullseye
  • 16:01 Amir1: Removing db1110 from zarcillo T335011
  • 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1110.eqiad.wmnet
  • 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1110.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 15:58 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1110.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48199 and previous config saved to /var/cache/conftool/dbconfig/20230511-155607-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 15:48 hashar: CI back up and fully operation (after the Gerrit upgrade)
  • 15:48 mutante: gerrit maintenance period ended - gerrit switched to new hardware, IP and distro version
  • 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2020 (T335845)', diff saved to https://phabricator.wikimedia.org/P48198 and previous config saved to /var/cache/conftool/dbconfig/20230511-154533-ladsgroup.json
  • 15:45 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
  • 15:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2020.codfw.wmnet with reason: Maintenance
  • 15:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2020.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1110.eqiad.wmnet
  • 15:42 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
  • 15:27 sukhe: [done] running homer for CR 919151: resolve connection issues to gerrit.wikimedia.org
  • 15:27 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab2002.wikimedia.org with OS bullseye
  • 15:21 sukhe: running homer for CR 919151: resolve connection issues to gerrit.wikimedia.org
  • 15:18 urandom: altering image_suggestions schema (generated data platform) — T336424
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024 (T335845)', diff saved to https://phabricator.wikimedia.org/P48197 and previous config saved to /var/cache/conftool/dbconfig/20230511-144959-ladsgroup.json
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024', diff saved to https://phabricator.wikimedia.org/P48195 and previous config saved to /var/cache/conftool/dbconfig/20230511-143453-ladsgroup.json
  • 14:27 moritzm: installing avahi security updates
  • 14:26 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2012
  • 14:26 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2012
  • 14:25 bking@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: maintenance
  • 14:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: maintenance
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024', diff saved to https://phabricator.wikimedia.org/P48194 and previous config saved to /var/cache/conftool/dbconfig/20230511-141947-ladsgroup.json
  • 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: maintenance
  • 14:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: maintenance
  • 14:15 bking@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:15 bking@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:15 sukhe: sudo cumin -b1 -s1200 'A:cp and A:codfw' 'varnish-frontend-restart': T253093
  • 14:11 bking@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:09 bking@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:08 bking@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:08 thcipriani: starting Gerrit Switchover (Take II): The Reckoning
  • 14:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2024 (T335845)', diff saved to https://phabricator.wikimedia.org/P48192 and previous config saved to /var/cache/conftool/dbconfig/20230511-140440-ladsgroup.json
  • 13:57 elukey: upgrade benthos (4.9.1 -> 4.15.0) on centrallog nodes - T331801
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2024 (T335845)', diff saved to https://phabricator.wikimedia.org/P48191 and previous config saved to /var/cache/conftool/dbconfig/20230511-135335-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2024.codfw.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2024.codfw.wmnet with reason: Maintenance
  • 13:49 moritzm: uploaded wmf-laptop 0.5.7 to component/wmf-sre-laptop
  • 13:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
  • 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 13:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:26 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:26 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 13:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:23 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:22 elukey: upload benthos 4.15.0-1 to {buster,bullseye}-wikimedia - T331801
  • 13:13 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye
  • 13:07 filippo@cumin1001: conftool action : set/weight=100; selector: name=thanos-fe2004.codfw.wmnet
  • 13:07 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2004.codfw.wmnet,service=thanos-web
  • 13:07 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2004.codfw.wmnet
  • 13:07 filippo@cumin1001: conftool action : set/pooled=true; selector: name=thanos-fe2004.eqiad.wmnet
  • 13:06 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2004.eqiad.wmnet
  • 13:06 filippo@cumin1001: conftool action : set/weight=100; selector: name=thanos-fe2004.eqiad.wmnet
  • 13:05 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1004.eqiad.wmnet,service=thanos-web
  • 13:05 filippo@cumin1001: conftool action : set/weight=100; selector: name=thanos-fe1004.eqiad.wmnet
  • 12:58 ladsgroup@deploy1002: Finished scap: Backport for Add outreachwiki to wikidataclient dblist (T171140) (duration: 11m 05s)
  • 12:54 godog: roll-restart thanos-fe swift-proxy to apply config changes - T336348
  • 12:48 ladsgroup@deploy1002: ladsgroup: Backport for Add outreachwiki to wikidataclient dblist (T171140) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 12:47 ladsgroup@deploy1002: Started scap: Backport for Add outreachwiki to wikidataclient dblist (T171140)
  • 12:41 Amir1: creating wikidata client tables for outreachwiki (T171140)
  • 12:18 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2002.wikimedia.org with OS bullseye
  • 12:01 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 11:57 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
  • 11:54 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48190 and previous config saved to /var/cache/conftool/dbconfig/20230511-115201-root.json
  • 11:39 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab2002.wikimedia.org with OS bullseye
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48189 and previous config saved to /var/cache/conftool/dbconfig/20230511-113657-root.json
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48187 and previous config saved to /var/cache/conftool/dbconfig/20230511-112152-root.json
  • 11:08 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48186 and previous config saved to /var/cache/conftool/dbconfig/20230511-110647-root.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48185 and previous config saved to /var/cache/conftool/dbconfig/20230511-105142-root.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48184 and previous config saved to /var/cache/conftool/dbconfig/20230511-103638-root.json
  • 10:24 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48183 and previous config saved to /var/cache/conftool/dbconfig/20230511-102133-root.json
  • 10:17 moritzm: installing modsecurity-crs security updates
  • 10:10 moritzm: installing protobuf security updates
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48182 and previous config saved to /var/cache/conftool/dbconfig/20230511-100628-root.json
  • 09:35 moritzm: installing distro-info-data updates on buster
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137', diff saved to https://phabricator.wikimedia.org/P48181 and previous config saved to /var/cache/conftool/dbconfig/20230511-092848-root.json
  • 08:59 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 08:56 jelto@cumin1001: END (ERROR) - Cookbook sre.gitlab.upgrade (exit_code=97) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:cassandra-dev
  • 08:40 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 08:40 elukey: `apt-get clean` on orespoolcounter nodes to free space in the root partition
  • 08:33 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 08:13 moritzm: installing Linux 4.19.282 updates on Buster systems
  • 08:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.8 refs T330214
  • 08:06 jmm@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
  • 08:05 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 07:43 jmm@cumin2002: END (FAIL) - Cookbook sre.cassandra.roll-reboot (exit_code=1) rolling reboot on A:cassandra-dev
  • 07:43 jmm@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
  • 07:41 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 07:39 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 2518
  • 07:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2518
  • 07:14 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 20940
  • 07:13 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Failover pc2 eqiad master" (duration: 07m 41s)
  • 07:07 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Failover pc2 eqiad master" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:05 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Failover pc2 eqiad master"
  • 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2002.wikimedia.org
  • 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast2002.wikimedia.org
  • 06:42 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Failover pc2 eqiad master (duration: 08m 23s)
  • 06:36 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Failover pc2 eqiad master synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 06:34 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Failover pc2 eqiad master
  • 06:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 20940
  • 06:29 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Failover pc2 codfw master" (duration: 08m 12s)
  • 06:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 17676
  • 06:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 17676
  • 06:22 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Failover pc2 codfw master" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 06:21 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Failover pc2 codfw master"
  • 06:21 XioNoX: Configure/reconfigure 1:1 NAT for new fr-tech hosts (frbast2002, frmon2002) - T336450
  • 06:15 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 13335
  • 06:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
  • 06:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 714
  • 06:07 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Failover pc2 codfw master (duration: 07m 42s)
  • 06:05 kart_: Updated MinT to 2023-05-11-051736-production
  • 06:01 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Failover pc2 codfw master synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 06:00 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 05:59 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Failover pc2 codfw master
  • 05:58 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Failover pc2 codfw master synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 05:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 714
  • 05:57 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Failover pc2 codfw master
  • 05:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 05:55 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 05:53 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:48 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2139.codfw.wmnet with reason: T335396
  • 05:48 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2139.codfw.wmnet with reason: T335396
  • 05:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:44 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply

2023-05-10

  • 22:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2021.codfw.wmnet with OS buster
  • 21:52 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 21:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 21:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS buster
  • 21:31 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS buster
  • 21:31 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS buster
  • 20:58 ejegg: payments-wiki upgraded from 2125cea7 to d1c5fefc
  • 20:58 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye
  • 20:55 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@02d6ac9]: (no justification provided) (duration: 00m 11s)
  • 20:55 milimetric@deploy1002: Started deploy [airflow-dags/analytics@02d6ac9]: (no justification provided)
  • 20:33 hashar@deploy1002: Finished deploy [gerrit/gerrit@e815301]: Gerrit to 3.5.6 on gerrit1003 | T336339 (duration: 00m 06s)
  • 20:33 hashar@deploy1002: Started deploy [gerrit/gerrit@e815301]: Gerrit to 3.5.6 on gerrit1003 | T336339
  • 20:32 cjming: end of UTC late backport window
  • 20:21 cjming@deploy1002: Finished scap: Backport for Remove unnecessary jQuery closure (T324913) (duration: 09m 02s)
  • 20:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T335845)', diff saved to https://phabricator.wikimedia.org/P48177 and previous config saved to /var/cache/conftool/dbconfig/20230510-202014-ladsgroup.json
  • 20:14 cjming@deploy1002: cjming and jdlrobson: Backport for Remove unnecessary jQuery closure (T324913) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:12 cjming@deploy1002: Started scap: Backport for Remove unnecessary jQuery closure (T324913)
  • 20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P48176 and previous config saved to /var/cache/conftool/dbconfig/20230510-200508-ladsgroup.json
  • 20:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye
  • 20:00 milimetric@deploy1002: Finished deploy [analytics/refinery@4ccc172] (thin): Regular analytics weekly train THIN [analytics/refinery@4ccc172] (duration: 00m 05s)
  • 20:00 milimetric@deploy1002: Started deploy [analytics/refinery@4ccc172] (thin): Regular analytics weekly train THIN [analytics/refinery@4ccc172]
  • 20:00 milimetric@deploy1002: Finished deploy [analytics/refinery@4ccc172] (thin): Regular analytics weekly train THIN [analytics/refinery@4ccc172] (duration: 00m 26s)
  • 19:59 milimetric@deploy1002: Started deploy [analytics/refinery@4ccc172] (thin): Regular analytics weekly train THIN [analytics/refinery@4ccc172]
  • 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P48175 and previous config saved to /var/cache/conftool/dbconfig/20230510-195001-ladsgroup.json
  • 19:47 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=codfw
  • 19:35 milimetric@deploy1002: Finished deploy [analytics/refinery@4ccc172]: Regular analytics weekly train [analytics/refinery@4ccc172] (duration: 40m 28s)
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T335845)', diff saved to https://phabricator.wikimedia.org/P48174 and previous config saved to /var/cache/conftool/dbconfig/20230510-193455-ladsgroup.json
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T335845)', diff saved to https://phabricator.wikimedia.org/P48173 and previous config saved to /var/cache/conftool/dbconfig/20230510-192746-ladsgroup.json
  • 19:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T335845)', diff saved to https://phabricator.wikimedia.org/P48172 and previous config saved to /var/cache/conftool/dbconfig/20230510-192722-ladsgroup.json
  • 19:25 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P48171 and previous config saved to /var/cache/conftool/dbconfig/20230510-191216-ladsgroup.json
  • 19:08 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001
  • 19:00 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P48170 and previous config saved to /var/cache/conftool/dbconfig/20230510-185710-ladsgroup.json
  • 18:54 milimetric@deploy1002: Started deploy [analytics/refinery@4ccc172]: Regular analytics weekly train [analytics/refinery@4ccc172]
  • 18:45 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 191m 53s)
  • 18:43 ejegg: payments-wiki upgraded from ec5a5e92 to 2125cea7
  • 18:43 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Rolling restart to apply Cassandra 3.11.14 upgrade - eevans@cumin1001
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T335845)', diff saved to https://phabricator.wikimedia.org/P48169 and previous config saved to /var/cache/conftool/dbconfig/20230510-184202-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1218 (T335845)', diff saved to https://phabricator.wikimedia.org/P48168 and previous config saved to /var/cache/conftool/dbconfig/20230510-183441-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T335845)', diff saved to https://phabricator.wikimedia.org/P48167 and previous config saved to /var/cache/conftool/dbconfig/20230510-183418-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P48166 and previous config saved to /var/cache/conftool/dbconfig/20230510-181912-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P48165 and previous config saved to /var/cache/conftool/dbconfig/20230510-180406-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T335845)', diff saved to https://phabricator.wikimedia.org/P48164 and previous config saved to /var/cache/conftool/dbconfig/20230510-174859-ladsgroup.json
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T335845)', diff saved to https://phabricator.wikimedia.org/P48163 and previous config saved to /var/cache/conftool/dbconfig/20230510-174143-ladsgroup.json
  • 17:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 17:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T335845)', diff saved to https://phabricator.wikimedia.org/P48162 and previous config saved to /var/cache/conftool/dbconfig/20230510-174119-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P48161 and previous config saved to /var/cache/conftool/dbconfig/20230510-172613-ladsgroup.json
  • 17:23 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2012.codfw.wmnet with OS bullseye
  • 17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 17:11 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P48160 and previous config saved to /var/cache/conftool/dbconfig/20230510-171107-ladsgroup.json
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T335845)', diff saved to https://phabricator.wikimedia.org/P48159 and previous config saved to /var/cache/conftool/dbconfig/20230510-165601-ladsgroup.json
  • 16:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:50 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T335845)', diff saved to https://phabricator.wikimedia.org/P48158 and previous config saved to /var/cache/conftool/dbconfig/20230510-164842-ladsgroup.json
  • 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T335845)', diff saved to https://phabricator.wikimedia.org/P48157 and previous config saved to /var/cache/conftool/dbconfig/20230510-164818-ladsgroup.json
  • 16:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P48156 and previous config saved to /var/cache/conftool/dbconfig/20230510-163312-ladsgroup.json
  • 16:31 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 16:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:25 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 16:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P48155 and previous config saved to /var/cache/conftool/dbconfig/20230510-161806-ladsgroup.json
  • 16:15 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2012.codfw.wmnet with OS bullseye
  • 16:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 16:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T335845)', diff saved to https://phabricator.wikimedia.org/P48154 and previous config saved to /var/cache/conftool/dbconfig/20230510-160258-ladsgroup.json
  • 16:02 sukhe: sudo cumin -b1 -s1200 'A:cp and A:drmrs' 'varnish-frontend-restart': T253093
  • 15:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T335845)', diff saved to https://phabricator.wikimedia.org/P48153 and previous config saved to /var/cache/conftool/dbconfig/20230510-155429-ladsgroup.json
  • 15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T335845)', diff saved to https://phabricator.wikimedia.org/P48152 and previous config saved to /var/cache/conftool/dbconfig/20230510-155357-ladsgroup.json
  • 15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 15:47 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 15:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 15:42 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2012.codfw.wmnet with OS bullseye
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P48151 and previous config saved to /var/cache/conftool/dbconfig/20230510-153851-ladsgroup.json
  • 15:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 15:33 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P48150 and previous config saved to /var/cache/conftool/dbconfig/20230510-152345-ladsgroup.json
  • 15:17 sukhe: running authdns-update for CR 918527
  • 15:16 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 15:16 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 15:14 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:14 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:12 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 15:12 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T335845)', diff saved to https://phabricator.wikimedia.org/P48149 and previous config saved to /var/cache/conftool/dbconfig/20230510-150838-ladsgroup.json
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T335845)', diff saved to https://phabricator.wikimedia.org/P48148 and previous config saved to /var/cache/conftool/dbconfig/20230510-150009-ladsgroup.json
  • 15:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T335845)', diff saved to https://phabricator.wikimedia.org/P48147 and previous config saved to /var/cache/conftool/dbconfig/20230510-145946-ladsgroup.json
  • 14:58 cwhite: install vopsbot 0.3.4 on alert2001 T329791
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P48146 and previous config saved to /var/cache/conftool/dbconfig/20230510-144440-ladsgroup.json
  • 14:44 moritzm: restarting FPM/Apache on mw canaries to pick up libxml2 updates
  • 14:41 moritzm: installing libxml2 security updates on buster
  • 14:40 thcipriani: stopping gerrit on gerrit1001
  • 14:40 thcipriani: stopping gerrit on gerrit1003
  • 14:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: migration
  • 14:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: migration
  • 14:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: migration
  • 14:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: migration
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P48145 and previous config saved to /var/cache/conftool/dbconfig/20230510-142934-ladsgroup.json
  • 14:26 thcipriani: gerrit1003 switchover happening
  • 14:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:25 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T335845)', diff saved to https://phabricator.wikimedia.org/P48144 and previous config saved to /var/cache/conftool/dbconfig/20230510-141427-ladsgroup.json
  • 14:08 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 14:08 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T335845)', diff saved to https://phabricator.wikimedia.org/P48143 and previous config saved to /var/cache/conftool/dbconfig/20230510-140708-ladsgroup.json
  • 14:07 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
  • 14:07 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 14:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 14:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T335845)', diff saved to https://phabricator.wikimedia.org/P48142 and previous config saved to /var/cache/conftool/dbconfig/20230510-140644-ladsgroup.json
  • 14:02 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P48140 and previous config saved to /var/cache/conftool/dbconfig/20230510-135138-ladsgroup.json
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P48139 and previous config saved to /var/cache/conftool/dbconfig/20230510-133632-ladsgroup.json
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T335845)', diff saved to https://phabricator.wikimedia.org/P48138 and previous config saved to /var/cache/conftool/dbconfig/20230510-132126-ladsgroup.json
  • 13:19 taavi@deploy1002: Finished scap: Backport for [arwikisource] Replace the current logo with an identical HD version (T336193) (duration: 08m 00s)
  • 13:15 _joe_: rolling back vopsbot to 0.3.3
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T335845)', diff saved to https://phabricator.wikimedia.org/P48137 and previous config saved to /var/cache/conftool/dbconfig/20230510-131412-ladsgroup.json
  • 13:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T335845)', diff saved to https://phabricator.wikimedia.org/P48136 and previous config saved to /var/cache/conftool/dbconfig/20230510-131347-ladsgroup.json
  • 13:13 taavi@deploy1002: superpes and taavi: Backport for [arwikisource] Replace the current logo with an identical HD version (T336193) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:11 taavi@deploy1002: Started scap: Backport for [arwikisource] Replace the current logo with an identical HD version (T336193)
  • 13:06 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
  • 13:03 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P48135 and previous config saved to /var/cache/conftool/dbconfig/20230510-125840-ladsgroup.json
  • 12:56 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2002.wikimedia.org
  • 12:52 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P48134 and previous config saved to /var/cache/conftool/dbconfig/20230510-124334-ladsgroup.json
  • 12:30 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:30 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:29 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:29 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T335845)', diff saved to https://phabricator.wikimedia.org/P48133 and previous config saved to /var/cache/conftool/dbconfig/20230510-122828-ladsgroup.json
  • 12:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T335845)', diff saved to https://phabricator.wikimedia.org/P48132 and previous config saved to /var/cache/conftool/dbconfig/20230510-122316-ladsgroup.json
  • 12:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T335845)', diff saved to https://phabricator.wikimedia.org/P48131 and previous config saved to /var/cache/conftool/dbconfig/20230510-122253-ladsgroup.json
  • 12:13 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:13 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:12 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:11 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:10 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:10 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P48129 and previous config saved to /var/cache/conftool/dbconfig/20230510-120747-ladsgroup.json
  • 11:58 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:58 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:57 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:57 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:56 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:56 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P48128 and previous config saved to /var/cache/conftool/dbconfig/20230510-115241-ladsgroup.json
  • 11:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:49 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:46 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:46 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:43 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:43 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2005.codfw.wmnet with OS bookworm
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T335845)', diff saved to https://phabricator.wikimedia.org/P48127 and previous config saved to /var/cache/conftool/dbconfig/20230510-113734-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T335845)', diff saved to https://phabricator.wikimedia.org/P48126 and previous config saved to /var/cache/conftool/dbconfig/20230510-113215-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
  • 11:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 11:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T335845)', diff saved to https://phabricator.wikimedia.org/P48125 and previous config saved to /var/cache/conftool/dbconfig/20230510-112855-ladsgroup.json
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
  • 11:18 _joe_: installing vopsbot 0.3.4 on alert1001 T329791
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P48124 and previous config saved to /var/cache/conftool/dbconfig/20230510-111349-ladsgroup.json
  • 11:11 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P48123 and previous config saved to /var/cache/conftool/dbconfig/20230510-105843-ladsgroup.json
  • 10:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T335845)', diff saved to https://phabricator.wikimedia.org/P48122 and previous config saved to /var/cache/conftool/dbconfig/20230510-104337-ladsgroup.json
  • 10:38 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T335845)', diff saved to https://phabricator.wikimedia.org/P48121 and previous config saved to /var/cache/conftool/dbconfig/20230510-103712-ladsgroup.json
  • 10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 10:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T335845)', diff saved to https://phabricator.wikimedia.org/P48120 and previous config saved to /var/cache/conftool/dbconfig/20230510-103649-ladsgroup.json
  • 10:26 Amir1: Removing db1113 from zarcillo T336029
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T335845)', diff saved to https://phabricator.wikimedia.org/P48119 and previous config saved to /var/cache/conftool/dbconfig/20230510-102302-ladsgroup.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P48118 and previous config saved to /var/cache/conftool/dbconfig/20230510-102142-ladsgroup.json
  • 10:21 Amir1: start of clean up of echo notification in wikidatawiki (T318523)
  • 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1113.eqiad.wmnet
  • 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1113.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 10:16 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1113.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 10:13 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1113.eqiad.wmnet
  • 10:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P48117 and previous config saved to /var/cache/conftool/dbconfig/20230510-100756-ladsgroup.json
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P48116 and previous config saved to /var/cache/conftool/dbconfig/20230510-100636-ladsgroup.json
  • 10:01 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe2004.codfw.wmnet
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48115 and previous config saved to /var/cache/conftool/dbconfig/20230510-095309-root.json
  • 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P48114 and previous config saved to /var/cache/conftool/dbconfig/20230510-095250-ladsgroup.json
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T335845)', diff saved to https://phabricator.wikimedia.org/P48113 and previous config saved to /var/cache/conftool/dbconfig/20230510-095130-ladsgroup.json
  • 09:50 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2004.codfw.wmnet
  • 09:50 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe1004.eqiad.wmnet
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T335845)', diff saved to https://phabricator.wikimedia.org/P48112 and previous config saved to /var/cache/conftool/dbconfig/20230510-094452-ladsgroup.json
  • 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T335845)', diff saved to https://phabricator.wikimedia.org/P48111 and previous config saved to /var/cache/conftool/dbconfig/20230510-094429-ladsgroup.json
  • 09:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1004.eqiad.wmnet
  • 09:38 daniel@deploy1002: Finished scap: Backport for Enable parser cache warming jobs for parsoid on medium wikis (T329366) (duration: 08m 10s)
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48110 and previous config saved to /var/cache/conftool/dbconfig/20230510-093804-root.json
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T335845)', diff saved to https://phabricator.wikimedia.org/P48109 and previous config saved to /var/cache/conftool/dbconfig/20230510-093743-ladsgroup.json
  • 09:31 daniel@deploy1002: daniel: Backport for Enable parser cache warming jobs for parsoid on medium wikis (T329366) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48108 and previous config saved to /var/cache/conftool/dbconfig/20230510-093128-root.json
  • 09:30 daniel@deploy1002: Started scap: Backport for Enable parser cache warming jobs for parsoid on medium wikis (T329366)
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P48107 and previous config saved to /var/cache/conftool/dbconfig/20230510-092923-ladsgroup.json
  • 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1220 (T335845)', diff saved to https://phabricator.wikimedia.org/P48106 and previous config saved to /var/cache/conftool/dbconfig/20230510-092531-ladsgroup.json
  • 09:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1220.eqiad.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T335845)', diff saved to https://phabricator.wikimedia.org/P48105 and previous config saved to /var/cache/conftool/dbconfig/20230510-092507-ladsgroup.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48104 and previous config saved to /var/cache/conftool/dbconfig/20230510-092259-root.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48103 and previous config saved to /var/cache/conftool/dbconfig/20230510-091624-root.json
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P48102 and previous config saved to /var/cache/conftool/dbconfig/20230510-091417-ladsgroup.json
  • 09:12 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
  • 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P48101 and previous config saved to /var/cache/conftool/dbconfig/20230510-091001-ladsgroup.json
  • 09:09 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48100 and previous config saved to /var/cache/conftool/dbconfig/20230510-090755-root.json
  • 09:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 09:01 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 09:01 hashar: Gerrit restarted at version 3.5.6 | T336339
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48099 and previous config saved to /var/cache/conftool/dbconfig/20230510-090119-root.json
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T335845)', diff saved to https://phabricator.wikimedia.org/P48098 and previous config saved to /var/cache/conftool/dbconfig/20230510-085910-ladsgroup.json
  • 08:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 | T336339 (duration: 00m 05s)
  • 08:57 hashar@deploy1002: Started deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 | T336339
  • 08:56 hashar@deploy1002: Finished deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 | T336339 (duration: 00m 09s)
  • 08:56 hashar@deploy1002: Started deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit1001 | T336339
  • 08:55 hashar: Stopping Gerrit for 3.5.5 > 3.5.6 upgrade T336339
  • 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P48097 and previous config saved to /var/cache/conftool/dbconfig/20230510-085455-ladsgroup.json
  • 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T335845)', diff saved to https://phabricator.wikimedia.org/P48096 and previous config saved to /var/cache/conftool/dbconfig/20230510-085330-ladsgroup.json
  • 08:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 08:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48095 and previous config saved to /var/cache/conftool/dbconfig/20230510-085250-root.json
  • 08:51 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 08:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit2002 | T336339 (duration: 00m 07s)
  • 08:49 hashar@deploy1002: Started deploy [gerrit/gerrit@67ba7ab]: Gerrit to 3.5.6 on gerrit2002 | T336339
  • 08:48 hashar: deploy1002: git reset `/srv/deployment/gerrit/gerrit` which had bunch of locally modified files for some reason # T336339
  • 08:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48094 and previous config saved to /var/cache/conftool/dbconfig/20230510-084614-root.json
  • 08:40 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 08:40 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 08:39 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T335845)', diff saved to https://phabricator.wikimedia.org/P48093 and previous config saved to /var/cache/conftool/dbconfig/20230510-083948-ladsgroup.json
  • 08:39 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48092 and previous config saved to /var/cache/conftool/dbconfig/20230510-083745-root.json
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T335845)', diff saved to https://phabricator.wikimedia.org/P48091 and previous config saved to /var/cache/conftool/dbconfig/20230510-083253-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48090 and previous config saved to /var/cache/conftool/dbconfig/20230510-083109-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48089 and previous config saved to /var/cache/conftool/dbconfig/20230510-082240-root.json
  • 08:21 hashar@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.8 refs T330214 (duration: 05m 55s)
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48088 and previous config saved to /var/cache/conftool/dbconfig/20230510-081605-root.json
  • 08:15 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.8 refs T330214
  • 08:14 godog: re-enable eqsin remote syslog towards centrallog - T336345
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48087 and previous config saved to /var/cache/conftool/dbconfig/20230510-080736-root.json
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48086 and previous config saved to /var/cache/conftool/dbconfig/20230510-080100-root.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48085 and previous config saved to /var/cache/conftool/dbconfig/20230510-080003-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48084 and previous config saved to /var/cache/conftool/dbconfig/20230510-075957-root.json
  • 07:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 07:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host testvm2005.codfw.wmnet with OS bookworm
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48083 and previous config saved to /var/cache/conftool/dbconfig/20230510-074555-root.json
  • 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 07:45 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 07:45 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48082 and previous config saved to /var/cache/conftool/dbconfig/20230510-074458-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48081 and previous config saved to /var/cache/conftool/dbconfig/20230510-074452-root.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2117 T334650', diff saved to https://phabricator.wikimedia.org/P48080 and previous config saved to /var/cache/conftool/dbconfig/20230510-074237-root.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48079 and previous config saved to /var/cache/conftool/dbconfig/20230510-073833-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48078 and previous config saved to /var/cache/conftool/dbconfig/20230510-072954-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48077 and previous config saved to /var/cache/conftool/dbconfig/20230510-072948-root.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48076 and previous config saved to /var/cache/conftool/dbconfig/20230510-072329-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48075 and previous config saved to /var/cache/conftool/dbconfig/20230510-071449-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48074 and previous config saved to /var/cache/conftool/dbconfig/20230510-071443-root.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48073 and previous config saved to /var/cache/conftool/dbconfig/20230510-070824-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48072 and previous config saved to /var/cache/conftool/dbconfig/20230510-065944-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48071 and previous config saved to /var/cache/conftool/dbconfig/20230510-065938-root.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48070 and previous config saved to /var/cache/conftool/dbconfig/20230510-065319-root.json
  • 06:52 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48069 and previous config saved to /var/cache/conftool/dbconfig/20230510-064439-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48068 and previous config saved to /var/cache/conftool/dbconfig/20230510-064433-root.json
  • 06:44 marostegui: dbmaint eqiad failover s3 sanitarium master T336252
  • 06:41 marostegui@cumin2002: dbctl commit (dc=all): 'Depool db1112 db1212 T336252', diff saved to https://phabricator.wikimedia.org/P48067 and previous config saved to /var/cache/conftool/dbconfig/20230510-064119-marostegui.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48066 and previous config saved to /var/cache/conftool/dbconfig/20230510-063814-root.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48065 and previous config saved to /var/cache/conftool/dbconfig/20230510-062309-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48064 and previous config saved to /var/cache/conftool/dbconfig/20230510-060805-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180', diff saved to https://phabricator.wikimedia.org/P48063 and previous config saved to /var/cache/conftool/dbconfig/20230510-060656-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P48062 and previous config saved to /var/cache/conftool/dbconfig/20230510-055929-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P48061 and previous config saved to /var/cache/conftool/dbconfig/20230510-055300-root.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2151', diff saved to https://phabricator.wikimedia.org/P48060 and previous config saved to /var/cache/conftool/dbconfig/20230510-054833-root.json
  • 05:42 kart_: Updated MinT to 2023-05-10-045734-production (T331505)
  • 05:42 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 05:37 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 05:35 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 05:32 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:28 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:26 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 04:10 mutante: gerrit1001 - rsyncing data over to gerrit1003, as root in a screen, but slowly with bwlimit 5m
  • 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"

2023-05-09

  • 23:43 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 23:25 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 23:22 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
  • 23:02 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
  • 23:00 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
  • 22:46 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:42 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
  • 22:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
  • 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T335845)', diff saved to https://phabricator.wikimedia.org/P48058 and previous config saved to /var/cache/conftool/dbconfig/20230509-223346-ladsgroup.json
  • 22:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 22:28 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 22:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P48057 and previous config saved to /var/cache/conftool/dbconfig/20230509-221840-ladsgroup.json
  • 22:18 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
  • 22:06 inflatador: bking@wcqs1002 depool wcqs1002 while it catches up on lag
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P48056 and previous config saved to /var/cache/conftool/dbconfig/20230509-220333-ladsgroup.json
  • 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T335845)', diff saved to https://phabricator.wikimedia.org/P48055 and previous config saved to /var/cache/conftool/dbconfig/20230509-214827-ladsgroup.json
  • 21:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 21:42 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T335845)', diff saved to https://phabricator.wikimedia.org/P48054 and previous config saved to /var/cache/conftool/dbconfig/20230509-213834-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 21:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T335845)', diff saved to https://phabricator.wikimedia.org/P48053 and previous config saved to /var/cache/conftool/dbconfig/20230509-213808-ladsgroup.json
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P48052 and previous config saved to /var/cache/conftool/dbconfig/20230509-212302-ladsgroup.json
  • 21:19 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
  • 21:17 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
  • 21:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P48051 and previous config saved to /var/cache/conftool/dbconfig/20230509-210755-ladsgroup.json
  • 20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T335845)', diff saved to https://phabricator.wikimedia.org/P48050 and previous config saved to /var/cache/conftool/dbconfig/20230509-205249-ladsgroup.json
  • 20:52 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
  • 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2096 (T335845)', diff saved to https://phabricator.wikimedia.org/P48049 and previous config saved to /var/cache/conftool/dbconfig/20230509-204604-ladsgroup.json
  • 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 20:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 20:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 20:42 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 20:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 20:31 urbanecm@deploy1002: Finished scap: Backport for Add padding to limited-width toggle to account for close icon (T336274), Add padding to limited-width toggle to account for close icon (T336274) (duration: 08m 59s)
  • 20:24 urbanecm@deploy1002: urbanecm and jdrewniak: Backport for Add padding to limited-width toggle to account for close icon (T336274), Add padding to limited-width toggle to account for close icon (T336274) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:22 urbanecm@deploy1002: Started scap: Backport for Add padding to limited-width toggle to account for close icon (T336274), Add padding to limited-width toggle to account for close icon (T336274)
  • 20:22 urbanecm@deploy1002: Finished scap: Backport for Remove unused parsoidSettings, nativeGalleryEnabled (duration: 07m 11s)
  • 20:19 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 20:19 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
  • 20:14 urbanecm@deploy1002: Started scap: Backport for Remove unused parsoidSettings, nativeGalleryEnabled
  • 20:10 urbanecm@deploy1002: Finished scap: Backport for [Growth] Add mediawiki.mentor_dashboard.personalized_praise stream (duration: 07m 26s)
  • 20:03 urbanecm@deploy1002: Started scap: Backport for [Growth] Add mediawiki.mentor_dashboard.personalized_praise stream
  • 20:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 19:54 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
  • 19:34 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
  • 19:08 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:08 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
  • 18:57 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
  • 18:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 18:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 18:28 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
  • 18:06 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 18:01 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host vrts2001.codfw.wmnet with OS bullseye
  • 17:49 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:49 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage
  • 17:49 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:49 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:48 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:48 rzl@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 17:48 rzl@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 17:47 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:47 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 17:47 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:47 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:46 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:46 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:46 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage
  • 17:46 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:46 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:45 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 17:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:42 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:42 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:31 aokoth@cumin1001: START - Cookbook sre.ganeti.reimage for host vrts2001.codfw.wmnet with OS bullseye
  • 17:31 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye
  • 17:31 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye
  • 17:28 aokoth@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host vrts2001.codfw.wmnet with OS bullseye
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T335845)', diff saved to https://phabricator.wikimedia.org/P48048 and previous config saved to /var/cache/conftool/dbconfig/20230509-172826-ladsgroup.json
  • 17:20 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 17:17 rzl: rolling restart apache on eqiad appservers T225778
  • 17:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2012.codfw.wmnet with OS bullseye
  • 17:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P48047 and previous config saved to /var/cache/conftool/dbconfig/20230509-171320-ladsgroup.json
  • 17:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:11 rzl: rolling restart apache on codfw appservers T225778
  • 17:00 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 17:00 brett@cumin2002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 17:00 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P48046 and previous config saved to /var/cache/conftool/dbconfig/20230509-165813-ladsgroup.json
  • 16:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
  • 16:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 16:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
  • 16:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:46 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup2001-dev.codfw.wmnet with OS bullseye
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T335845)', diff saved to https://phabricator.wikimedia.org/P48045 and previous config saved to /var/cache/conftool/dbconfig/20230509-164307-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T335845)', diff saved to https://phabricator.wikimedia.org/P48044 and previous config saved to /var/cache/conftool/dbconfig/20230509-163646-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T335845)', diff saved to https://phabricator.wikimedia.org/P48043 and previous config saved to /var/cache/conftool/dbconfig/20230509-163621-ladsgroup.json
  • 16:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 16:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudbackup2001-dev.codfw.wmnet with OS bullseye
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
  • 16:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2012']
  • 16:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
  • 16:30 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T335845)', diff saved to https://phabricator.wikimedia.org/P48042 and previous config saved to /var/cache/conftool/dbconfig/20230509-162904-ladsgroup.json
  • 16:27 rzl: resumed puppet on appservers - T225778
  • 16:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2012']
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2012']
  • 16:23 rzl: rzl@mwdebug1001:~$ sudo apache2ctl restart
  • 16:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2012']
  • 16:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1002.eqiad.wmnet
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P48041 and previous config saved to /var/cache/conftool/dbconfig/20230509-162115-ladsgroup.json
  • 16:19 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1002.eqiad.wmnet
  • 16:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48039 and previous config saved to /var/cache/conftool/dbconfig/20230509-161358-ladsgroup.json
  • 16:11 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:09 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:09 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:08 jnuche@deploy1002: Installing scap version "4.52.1" for 593 hosts
  • 16:07 rzl: stopping puppet on appservers - T225778
  • 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2012.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P48038 and previous config saved to /var/cache/conftool/dbconfig/20230509-160608-ladsgroup.json
  • 16:04 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage
  • 16:01 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: host reimage
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48037 and previous config saved to /var/cache/conftool/dbconfig/20230509-155852-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T335845)', diff saved to https://phabricator.wikimedia.org/P48036 and previous config saved to /var/cache/conftool/dbconfig/20230509-155102-ladsgroup.json
  • 15:50 aokoth@cumin1001: START - Cookbook sre.ganeti.reimage for host vrts2001.codfw.wmnet with OS bullseye
  • 15:48 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye
  • 15:48 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts2001.codfw.wmnet with reason: Re-image w/ Bullseye
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T335845)', diff saved to https://phabricator.wikimedia.org/P48035 and previous config saved to /var/cache/conftool/dbconfig/20230509-154346-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T335845)', diff saved to https://phabricator.wikimedia.org/P48034 and previous config saved to /var/cache/conftool/dbconfig/20230509-154338-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
  • 15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T335845)', diff saved to https://phabricator.wikimedia.org/P48033 and previous config saved to /var/cache/conftool/dbconfig/20230509-154313-ladsgroup.json
  • 15:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudcontrol2001-dev - pt1979@cumin2002"
  • 15:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T335845)', diff saved to https://phabricator.wikimedia.org/P48032 and previous config saved to /var/cache/conftool/dbconfig/20230509-153715-ladsgroup.json
  • 15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T335845)', diff saved to https://phabricator.wikimedia.org/P48031 and previous config saved to /var/cache/conftool/dbconfig/20230509-153651-ladsgroup.json
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P48030 and previous config saved to /var/cache/conftool/dbconfig/20230509-152804-ladsgroup.json
  • 15:23 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs2012.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:22 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48029 and previous config saved to /var/cache/conftool/dbconfig/20230509-152145-ladsgroup.json
  • 15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2012 - pt1979@cumin2002"
  • 15:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2012 - pt1979@cumin2002"
  • 15:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P48028 and previous config saved to /var/cache/conftool/dbconfig/20230509-151258-ladsgroup.json
  • 15:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host testvm2005.codfw.wmnet with OS bookworm
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48027 and previous config saved to /var/cache/conftool/dbconfig/20230509-150639-ladsgroup.json
  • 15:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2180']
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T335845)', diff saved to https://phabricator.wikimedia.org/P48026 and previous config saved to /var/cache/conftool/dbconfig/20230509-145752-ladsgroup.json
  • 14:54 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 45m 45s)
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T335845)', diff saved to https://phabricator.wikimedia.org/P48025 and previous config saved to /var/cache/conftool/dbconfig/20230509-145133-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T335845)', diff saved to https://phabricator.wikimedia.org/P48024 and previous config saved to /var/cache/conftool/dbconfig/20230509-145128-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T335845)', diff saved to https://phabricator.wikimedia.org/P48023 and previous config saved to /var/cache/conftool/dbconfig/20230509-145057-ladsgroup.json
  • 14:50 sukhe: homer "cr*-codfw*" commit "Gerrit: 917885 remove decommissioned host lvs2008"
  • 14:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2180']
  • 14:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2008.codfw.wmnet
  • 14:45 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T335845)', diff saved to https://phabricator.wikimedia.org/P48022 and previous config saved to /var/cache/conftool/dbconfig/20230509-144457-ladsgroup.json
  • 14:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 14:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T335845)', diff saved to https://phabricator.wikimedia.org/P48021 and previous config saved to /var/cache/conftool/dbconfig/20230509-144433-ladsgroup.json
  • 14:44 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:41 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:37 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:37 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P48020 and previous config saved to /var/cache/conftool/dbconfig/20230509-143550-ladsgroup.json
  • 14:32 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs2008.codfw.wmnet
  • 14:32 sukhe: decommission lvs2008
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P48019 and previous config saved to /var/cache/conftool/dbconfig/20230509-142927-ladsgroup.json
  • 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1010
  • 14:29 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host backup1010
  • 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1011
  • 14:29 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host backup1011
  • 14:27 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:25 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2180']
  • 14:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2180']
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P48018 and previous config saved to /var/cache/conftool/dbconfig/20230509-142044-ladsgroup.json
  • 14:15 sukhe: set routing-options static route 208.80.153.240/28 next-hop 10.192.49.7 [move static route for high-traffic2 to lvs2010]: T335777
  • 14:15 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P48017 and previous config saved to /var/cache/conftool/dbconfig/20230509-141421-ladsgroup.json
  • 14:08 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T335845)', diff saved to https://phabricator.wikimedia.org/P48016 and previous config saved to /var/cache/conftool/dbconfig/20230509-140535-ladsgroup.json
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T335845)', diff saved to https://phabricator.wikimedia.org/P48015 and previous config saved to /var/cache/conftool/dbconfig/20230509-135915-ladsgroup.json
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T335845)', diff saved to https://phabricator.wikimedia.org/P48014 and previous config saved to /var/cache/conftool/dbconfig/20230509-135815-ladsgroup.json
  • 13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 13:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T335845)', diff saved to https://phabricator.wikimedia.org/P48013 and previous config saved to /var/cache/conftool/dbconfig/20230509-135750-ladsgroup.json
  • 13:49 taavi@deploy1002: Finished scap: Backport for Add $wmgUseRealMe (T324535) (duration: 07m 51s)
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T335845)', diff saved to https://phabricator.wikimedia.org/P48012 and previous config saved to /var/cache/conftool/dbconfig/20230509-134952-ladsgroup.json
  • 13:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T335845)', diff saved to https://phabricator.wikimedia.org/P48011 and previous config saved to /var/cache/conftool/dbconfig/20230509-134929-ladsgroup.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180 T336031', diff saved to https://phabricator.wikimedia.org/P48010 and previous config saved to /var/cache/conftool/dbconfig/20230509-134921-root.json
  • 13:44 moritzm: rearmed keyholder on netmon* post reboot
  • 13:43 taavi@deploy1002: taavi: Backport for Add $wmgUseRealMe (T324535) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:42 sukhe: sudo cumin -b1 -s1200 'A:cp and A:esams' 'varnish-frontend-restart: T253093
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P48009 and previous config saved to /var/cache/conftool/dbconfig/20230509-134244-ladsgroup.json
  • 13:42 taavi@deploy1002: Started scap: Backport for Add $wmgUseRealMe (T324535)
  • 13:38 taavi@deploy1002: Finished scap: Backport for Add RealMe to extension-list (T324535) (duration: 35m 47s)
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P48008 and previous config saved to /var/cache/conftool/dbconfig/20230509-133416-ladsgroup.json
  • 13:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-worker1088.eqiad.wmnet with reason: Replacing RAID controller battery
  • 13:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1001.eqiad.wmnet
  • 13:28 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on an-worker1088.eqiad.wmnet with reason: Replacing RAID controller battery
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P48007 and previous config saved to /var/cache/conftool/dbconfig/20230509-132737-ladsgroup.json
  • 13:27 moritzm: updated bookworm d-i image to 2022-05-09 daily build T330495
  • 13:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1001.eqiad.wmnet
  • 13:23 taavi@deploy1002: taavi: Backport for Add RealMe to extension-list (T324535) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-worker1088.eqiad.wmnet
  • 13:23 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-worker1088.eqiad.wmnet
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P48006 and previous config saved to /var/cache/conftool/dbconfig/20230509-131910-ladsgroup.json
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T335845)', diff saved to https://phabricator.wikimedia.org/P48005 and previous config saved to /var/cache/conftool/dbconfig/20230509-131231-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T335845)', diff saved to https://phabricator.wikimedia.org/P48004 and previous config saved to /var/cache/conftool/dbconfig/20230509-130524-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T335845)', diff saved to https://phabricator.wikimedia.org/P48003 and previous config saved to /var/cache/conftool/dbconfig/20230509-130459-ladsgroup.json
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync after adding ldap-rw servers - jmm@cumin2002"
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T335845)', diff saved to https://phabricator.wikimedia.org/P48002 and previous config saved to /var/cache/conftool/dbconfig/20230509-130404-ladsgroup.json
  • 13:02 taavi@deploy1002: Started scap: Backport for Add RealMe to extension-list (T324535)
  • 13:01 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync after adding ldap-rw servers - jmm@cumin2002"
  • 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1088.eqiad.wmnet with reason: Upgrading RAID controller firmware
  • 12:58 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1088.eqiad.wmnet with reason: Upgrading RAID controller firmware
  • 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ldap-rw2001.wikimedia.org with OS bullseye
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T335845)', diff saved to https://phabricator.wikimedia.org/P48001 and previous config saved to /var/cache/conftool/dbconfig/20230509-125644-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T335845)', diff saved to https://phabricator.wikimedia.org/P48000 and previous config saved to /var/cache/conftool/dbconfig/20230509-125620-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P47999 and previous config saved to /var/cache/conftool/dbconfig/20230509-124953-ladsgroup.json
  • 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-rw2001.wikimedia.org with reason: host reimage
  • 12:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-rw2001.wikimedia.org with reason: host reimage
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P47997 and previous config saved to /var/cache/conftool/dbconfig/20230509-124114-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P47996 and previous config saved to /var/cache/conftool/dbconfig/20230509-123447-ladsgroup.json
  • 12:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1003.eqiad.wmnet
  • 12:29 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host ldap-rw2001.wikimedia.org with OS bullseye
  • 12:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P47995 and previous config saved to /var/cache/conftool/dbconfig/20230509-122608-ladsgroup.json
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T335845)', diff saved to https://phabricator.wikimedia.org/P47994 and previous config saved to /var/cache/conftool/dbconfig/20230509-121941-ladsgroup.json
  • 12:14 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aphlict1001.eqiad.wmnet
  • 12:14 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:14 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aphlict1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T335845)', diff saved to https://phabricator.wikimedia.org/P47992 and previous config saved to /var/cache/conftool/dbconfig/20230509-121119-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47991 and previous config saved to /var/cache/conftool/dbconfig/20230509-121102-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T335845)', diff saved to https://phabricator.wikimedia.org/P47990 and previous config saved to /var/cache/conftool/dbconfig/20230509-121053-ladsgroup.json
  • 12:06 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aphlict1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47989 and previous config saved to /var/cache/conftool/dbconfig/20230509-120433-ladsgroup.json
  • 12:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47988 and previous config saved to /var/cache/conftool/dbconfig/20230509-120410-ladsgroup.json
  • 12:02 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 12:02 kart_: Updated cxserver to 2023-05-08-134152-production (T336115, T335987, T331835)
  • 11:58 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts aphlict1001.eqiad.wmnet
  • 11:58 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:57 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P47987 and previous config saved to /var/cache/conftool/dbconfig/20230509-115547-ladsgroup.json
  • 11:53 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:53 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P47986 and previous config saved to /var/cache/conftool/dbconfig/20230509-114903-ladsgroup.json
  • 11:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:45 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P47985 and previous config saved to /var/cache/conftool/dbconfig/20230509-114041-ladsgroup.json
  • 11:36 kart_: Updated MinT to 2023-05-09-110213-production (T331505, T335725, T331505)
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P47984 and previous config saved to /var/cache/conftool/dbconfig/20230509-113357-ladsgroup.json
  • 11:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ldap-rw1001.wikimedia.org with OS bullseye
  • 11:29 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 11:27 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T335845)', diff saved to https://phabricator.wikimedia.org/P47983 and previous config saved to /var/cache/conftool/dbconfig/20230509-112535-ladsgroup.json
  • 11:23 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 11:20 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-rw1001.wikimedia.org with reason: host reimage
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47982 and previous config saved to /var/cache/conftool/dbconfig/20230509-111851-ladsgroup.json
  • 11:18 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T335845)', diff saved to https://phabricator.wikimedia.org/P47981 and previous config saved to /var/cache/conftool/dbconfig/20230509-111755-ladsgroup.json
  • 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T335845)', diff saved to https://phabricator.wikimedia.org/P47980 and previous config saved to /var/cache/conftool/dbconfig/20230509-111730-ladsgroup.json
  • 11:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-rw1001.wikimedia.org with reason: host reimage
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47979 and previous config saved to /var/cache/conftool/dbconfig/20230509-111235-ladsgroup.json
  • 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T335845)', diff saved to https://phabricator.wikimedia.org/P47978 and previous config saved to /var/cache/conftool/dbconfig/20230509-111211-ladsgroup.json
  • 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host ldap-rw1001.wikimedia.org with OS bullseye
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P47977 and previous config saved to /var/cache/conftool/dbconfig/20230509-110222-ladsgroup.json
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P47976 and previous config saved to /var/cache/conftool/dbconfig/20230509-105704-ladsgroup.json
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P47975 and previous config saved to /var/cache/conftool/dbconfig/20230509-104715-ladsgroup.json
  • 10:45 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcontrol2001-dev.wikimedia.org
  • 10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:44 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 10:42 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P47974 and previous config saved to /var/cache/conftool/dbconfig/20230509-104158-ladsgroup.json
  • 10:39 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2001-dev.wikimedia.org
  • 10:36 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 10:36 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 10:32 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts cloudcontrol2001-dev.wikimedia.org
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T335845)', diff saved to https://phabricator.wikimedia.org/P47973 and previous config saved to /var/cache/conftool/dbconfig/20230509-103209-ladsgroup.json
  • 10:29 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2001-dev.wikimedia.org
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T335845)', diff saved to https://phabricator.wikimedia.org/P47972 and previous config saved to /var/cache/conftool/dbconfig/20230509-102652-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T335845)', diff saved to https://phabricator.wikimedia.org/P47971 and previous config saved to /var/cache/conftool/dbconfig/20230509-102644-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 10:26 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T335845)', diff saved to https://phabricator.wikimedia.org/P47970 and previous config saved to /var/cache/conftool/dbconfig/20230509-102619-ladsgroup.json
  • 10:26 volans@cumin1001: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling update on A:netbox-canary
  • 10:26 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 10:24 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-e1-eqiad down.
  • 10:24 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-e1-eqiad down.
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T335845)', diff saved to https://phabricator.wikimedia.org/P47969 and previous config saved to /var/cache/conftool/dbconfig/20230509-102001-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T335845)', diff saved to https://phabricator.wikimedia.org/P47968 and previous config saved to /var/cache/conftool/dbconfig/20230509-101938-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P47967 and previous config saved to /var/cache/conftool/dbconfig/20230509-101113-ladsgroup.json
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P47966 and previous config saved to /var/cache/conftool/dbconfig/20230509-100431-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P47965 and previous config saved to /var/cache/conftool/dbconfig/20230509-095607-ladsgroup.json
  • 09:55 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcontrol2001-dev.wikimedia.org
  • 09:55 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:55 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2001-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 09:53 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2001-dev.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P47964 and previous config saved to /var/cache/conftool/dbconfig/20230509-094925-ladsgroup.json
  • 09:49 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T335845)', diff saved to https://phabricator.wikimedia.org/P47962 and previous config saved to /var/cache/conftool/dbconfig/20230509-094100-ladsgroup.json
  • 09:37 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2001-dev.wikimedia.org
  • 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping3003.esams.wmnet
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T335845)', diff saved to https://phabricator.wikimedia.org/P47961 and previous config saved to /var/cache/conftool/dbconfig/20230509-093419-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T335845)', diff saved to https://phabricator.wikimedia.org/P47960 and previous config saved to /var/cache/conftool/dbconfig/20230509-093320-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping3003.esams.wmnet
  • 09:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 09:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 09:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T335845)', diff saved to https://phabricator.wikimedia.org/P47959 and previous config saved to /var/cache/conftool/dbconfig/20230509-092843-ladsgroup.json
  • 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 09:23 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 09:17 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.8 refs T330214
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2003.codfw.wmnet
  • 09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2003.codfw.wmnet
  • 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1003.eqiad.wmnet
  • 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1003.eqiad.wmnet
  • 08:40 marostegui: Stop mariadb on db1115 (old zarcillo master) T334455
  • 08:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 08:39 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-rw1001.wikimedia.org
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-rw1001.wikimedia.org - jmm@cumin2002"
  • 08:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-rw1001.wikimedia.org - jmm@cumin2002"
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-rw1001.wikimedia.org on all recursors
  • 08:36 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-rw1001.wikimedia.org on all recursors
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-rw1001.wikimedia.org - jmm@cumin2002"
  • 08:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-rw1001.wikimedia.org - jmm@cumin2002"
  • 08:30 marostegui: Failover m5-master from dbproxy1021 to dbproxy1017
  • 08:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-rw1001.wikimedia.org
  • 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-rw2001.wikimedia.org
  • 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-rw2001.wikimedia.org - jmm@cumin2002"
  • 08:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-rw2001.wikimedia.org - jmm@cumin2002"
  • 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-rw2001.wikimedia.org on all recursors
  • 08:19 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-rw2001.wikimedia.org on all recursors
  • 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:16 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: Release v3.2.9-wmf2 to production - volans@cumin1001 - T314933
  • 08:13 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: Release v3.2.9-wmf2 to production - volans@cumin1001 - T314933
  • 08:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: netbox upgrade
  • 08:13 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: netbox upgrade
  • 08:12 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:12 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-rw2001.wikimedia.org - jmm@cumin2002"
  • 08:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-rw2001.wikimedia.org - jmm@cumin2002"
  • 08:08 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: Release v3.2.9-wmf2 to production - volans@cumin1001 - T314933
  • 08:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:05 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-rw2001.wikimedia.org
  • 08:04 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: Release v3.2.9-wmf2 to production - volans@cumin1001 - T314933
  • 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
  • 06:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
  • 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netmon1003.wikimedia.org
  • 06:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
  • 06:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
  • 06:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
  • 05:28 marostegui: Starting db-inventory eqiad failover from db1115 to db1215 - T335014
  • 05:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet,db[1115,1215].eqiad.wmnet with reason: Primary switchover db_inventory T335014
  • 05:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2185.codfw.wmnet,db[1115,1215].eqiad.wmnet with reason: Primary switchover db_inventory T335014
  • 03:50 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.8 refs T330214 (duration: 47m 55s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.8 refs T330214
  • 01:53 cstone: civicrm upgraded from 301e24e4 to d8a1a562
  • 00:43 eileen: civicrm upgraded from d5229d22 to 301e24e4
  • 00:06 zabe@deploy1002: Finished scap: Backport for Start writing to af_actor/afh_actor everywhere (T334295) (duration: 07m 22s)
  • 00:00 zabe@deploy1002: zabe: Backport for Start writing to af_actor/afh_actor everywhere (T334295) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet

2023-05-08

  • 23:58 zabe@deploy1002: Started scap: Backport for Start writing to af_actor/afh_actor everywhere (T334295)
  • 23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T335845)', diff saved to https://phabricator.wikimedia.org/P47958 and previous config saved to /var/cache/conftool/dbconfig/20230508-233832-ladsgroup.json
  • 23:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P47957 and previous config saved to /var/cache/conftool/dbconfig/20230508-232325-ladsgroup.json
  • 23:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P47956 and previous config saved to /var/cache/conftool/dbconfig/20230508-230819-ladsgroup.json
  • 22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T335845)', diff saved to https://phabricator.wikimedia.org/P47955 and previous config saved to /var/cache/conftool/dbconfig/20230508-225313-ladsgroup.json
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T335845)', diff saved to https://phabricator.wikimedia.org/P47954 and previous config saved to /var/cache/conftool/dbconfig/20230508-224657-ladsgroup.json
  • 22:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 22:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T335845)', diff saved to https://phabricator.wikimedia.org/P47953 and previous config saved to /var/cache/conftool/dbconfig/20230508-224622-ladsgroup.json
  • 22:34 eileen: config revision changed from 7ac11236 to 48f7485f - disabled populate contribution tracking
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P47952 and previous config saved to /var/cache/conftool/dbconfig/20230508-223115-ladsgroup.json
  • 22:23 cstone: civicrm upgraded from 05523a9d to d5229d22
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P47951 and previous config saved to /var/cache/conftool/dbconfig/20230508-221609-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T335845)', diff saved to https://phabricator.wikimedia.org/P47950 and previous config saved to /var/cache/conftool/dbconfig/20230508-220103-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T335845)', diff saved to https://phabricator.wikimedia.org/P47949 and previous config saved to /var/cache/conftool/dbconfig/20230508-215323-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T335845)', diff saved to https://phabricator.wikimedia.org/P47948 and previous config saved to /var/cache/conftool/dbconfig/20230508-215300-ladsgroup.json
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P47947 and previous config saved to /var/cache/conftool/dbconfig/20230508-213754-ladsgroup.json
  • 21:24 mstyles@deploy1002: Finished scap: Backport for Disable translation memory on collabwiki (T313241) (duration: 06m 45s)
  • 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P47946 and previous config saved to /var/cache/conftool/dbconfig/20230508-212248-ladsgroup.json
  • 21:21 mforns@deploy1002: Finished deploy [airflow-dags/analytics@a6a3ceb]: (no justification provided) (duration: 00m 09s)
  • 21:21 mforns@deploy1002: Started deploy [airflow-dags/analytics@a6a3ceb]: (no justification provided)
  • 21:18 mstyles@deploy1002: mstyles and sbassett: Backport for Disable translation memory on collabwiki (T313241) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:17 mstyles@deploy1002: Started scap: Backport for Disable translation memory on collabwiki (T313241)
  • 21:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1001.eqiad.wmnet
  • 21:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T335845)', diff saved to https://phabricator.wikimedia.org/P47945 and previous config saved to /var/cache/conftool/dbconfig/20230508-210742-ladsgroup.json
  • 21:02 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1001.eqiad.wmnet
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T335845)', diff saved to https://phabricator.wikimedia.org/P47944 and previous config saved to /var/cache/conftool/dbconfig/20230508-210119-ladsgroup.json
  • 21:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 21:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T335845)', diff saved to https://phabricator.wikimedia.org/P47943 and previous config saved to /var/cache/conftool/dbconfig/20230508-210056-ladsgroup.json
  • 20:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1002.eqiad.wmnet
  • 20:53 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1002.eqiad.wmnet
  • 20:49 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1004.eqiad.wmnet
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P47942 and previous config saved to /var/cache/conftool/dbconfig/20230508-204549-ladsgroup.json
  • 20:43 mutante: miscweb2003 - rebooting
  • 20:41 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1004.eqiad.wmnet
  • 20:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on miscweb2003.codfw.wmnet with reason: reboot
  • 20:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on miscweb2003.codfw.wmnet with reason: reboot
  • 20:39 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1003.eqiad.wmnet
  • 20:38 taavi@deploy1002: Finished scap: Backport for Deploy fixed width indicator to wikis (T335307) (duration: 08m 22s)
  • 20:36 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host vrts1001.eqiad.wmnet with OS bullseye
  • 20:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on miscweb2003.codfw.wmnet with reason: reboot
  • 20:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on miscweb2003.codfw.wmnet with reason: reboot
  • 20:32 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1003.eqiad.wmnet
  • 20:31 taavi@deploy1002: jdlrobson and taavi: Backport for Deploy fixed width indicator to wikis (T335307) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P47941 and previous config saved to /var/cache/conftool/dbconfig/20230508-203043-ladsgroup.json
  • 20:30 taavi@deploy1002: Started scap: Backport for Deploy fixed width indicator to wikis (T335307)
  • 20:29 taavi@deploy1002: Finished scap: Backport for Ensure page load popupNotification is closed when the toggle button is clicked (T335153) (duration: 07m 58s)
  • 20:25 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2007.codfw.wmnet
  • 20:24 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts1001.eqiad.wmnet with reason: host reimage
  • 20:23 taavi@deploy1002: jdlrobson and taavi: Backport for Ensure page load popupNotification is closed when the toggle button is clicked (T335153) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:21 taavi@deploy1002: Started scap: Backport for Ensure page load popupNotification is closed when the toggle button is clicked (T335153)
  • 20:21 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts1001.eqiad.wmnet with reason: host reimage
  • 20:20 taavi@deploy1002: Finished scap: Backport for Update a/b test code for visual enhancements a/b test (T333715), Enable DiscussionTools visual enhancements a/b test (T302358) (duration: 11m 54s)
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T335845)', diff saved to https://phabricator.wikimedia.org/P47939 and previous config saved to /var/cache/conftool/dbconfig/20230508-201537-ladsgroup.json
  • 20:11 aokoth@cumin1001: START - Cookbook sre.ganeti.reimage for host vrts1001.eqiad.wmnet with OS bullseye
  • 20:09 taavi@deploy1002: kemayo and taavi: Backport for Update a/b test code for visual enhancements a/b test (T333715), Enable DiscussionTools visual enhancements a/b test (T302358) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T335845)', diff saved to https://phabricator.wikimedia.org/P47938 and previous config saved to /var/cache/conftool/dbconfig/20230508-200825-ladsgroup.json
  • 20:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 20:08 taavi@deploy1002: Started scap: Backport for Update a/b test code for visual enhancements a/b test (T333715), Enable DiscussionTools visual enhancements a/b test (T302358)
  • 20:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T335845)', diff saved to https://phabricator.wikimedia.org/P47937 and previous config saved to /var/cache/conftool/dbconfig/20230508-200802-ladsgroup.json
  • 20:05 aokoth@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host vrts1001.eqiad.wmnet
  • 20:05 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM vrts1001.eqiad.wmnet - aokoth@cumin1001"
  • 20:04 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM vrts1001.eqiad.wmnet - aokoth@cumin1001"
  • 20:04 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) vrts1001.eqiad.wmnet on all recursors
  • 20:04 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache vrts1001.eqiad.wmnet on all recursors
  • 20:04 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:04 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1001.eqiad.wmnet - aokoth@cumin1001"
  • 20:03 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1001.eqiad.wmnet - aokoth@cumin1001"
  • 19:59 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 19:59 aokoth@cumin1001: START - Cookbook sre.ganeti.makevm for new host vrts1001.eqiad.wmnet
  • 19:54 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging1005.eqiad.wmnet
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P47936 and previous config saved to /var/cache/conftool/dbconfig/20230508-195256-ladsgroup.json
  • 19:52 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2007.codfw.wmnet
  • 19:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2004.codfw.wmnet
  • 19:46 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging1005.eqiad.wmnet
  • 19:45 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2004.codfw.wmnet
  • 19:45 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2004.codfw.wmnet
  • 19:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2006.codfw.wmnet
  • 19:39 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2004.codfw.wmnet
  • 19:38 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2005.codfw.wmnet
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P47935 and previous config saved to /var/cache/conftool/dbconfig/20230508-193750-ladsgroup.json
  • 19:34 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2006.codfw.wmnet
  • 19:32 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2005.codfw.wmnet
  • 19:31 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2003.codfw.wmnet
  • 19:24 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2003.codfw.wmnet
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T335845)', diff saved to https://phabricator.wikimedia.org/P47934 and previous config saved to /var/cache/conftool/dbconfig/20230508-192243-ladsgroup.json
  • 19:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs2006.codfw.wmnet with reason: rebooting to help with lag
  • 19:20 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs2006.codfw.wmnet with reason: rebooting to help with lag
  • 19:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2006.codfw.wmnet
  • 19:20 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2006.codfw.wmnet
  • 19:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1004.eqiad.wmnet with reason: rebooting to help with lag
  • 19:18 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1004.eqiad.wmnet with reason: rebooting to help with lag
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T335845)', diff saved to https://phabricator.wikimedia.org/P47933 and previous config saved to /var/cache/conftool/dbconfig/20230508-191630-ladsgroup.json
  • 19:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 19:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T335845)', diff saved to https://phabricator.wikimedia.org/P47932 and previous config saved to /var/cache/conftool/dbconfig/20230508-191607-ladsgroup.json
  • 19:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 19 hosts with reason: rebooting to help with lag
  • 19:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 19 hosts with reason: rebooting to help with lag
  • 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P47931 and previous config saved to /var/cache/conftool/dbconfig/20230508-190100-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P47930 and previous config saved to /var/cache/conftool/dbconfig/20230508-184554-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T335845)', diff saved to https://phabricator.wikimedia.org/P47929 and previous config saved to /var/cache/conftool/dbconfig/20230508-183048-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T335845)', diff saved to https://phabricator.wikimedia.org/P47928 and previous config saved to /var/cache/conftool/dbconfig/20230508-182350-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T335845)', diff saved to https://phabricator.wikimedia.org/P47927 and previous config saved to /var/cache/conftool/dbconfig/20230508-182327-ladsgroup.json
  • 18:09 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:08 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P47926 and previous config saved to /var/cache/conftool/dbconfig/20230508-180820-ladsgroup.json
  • 18:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 18:04 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 113m 03s)
  • 18:03 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2002.codfw.wmnet
  • 18:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 18:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T335845)', diff saved to https://phabricator.wikimedia.org/P47925 and previous config saved to /var/cache/conftool/dbconfig/20230508-180239-ladsgroup.json
  • 17:57 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2002.codfw.wmnet
  • 17:54 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-logging2001.codfw.wmnet
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P47923 and previous config saved to /var/cache/conftool/dbconfig/20230508-175314-ladsgroup.json
  • 17:51 sukhe: set routing-options static route 208.80.153.224/28 [high-traffic1, codfw] next-hop 10.192.0.29: T326767
  • 17:48 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-logging2001.codfw.wmnet
  • 17:48 sukhe: restart pybal on lvs2011 to pick up bgp med change: T326767
  • 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P47922 and previous config saved to /var/cache/conftool/dbconfig/20230508-174732-ladsgroup.json
  • 17:39 sukhe: homer "cr*-codfw*" commit "Gerrit: 914871 add new LVS host lvs2011": T326767
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T335845)', diff saved to https://phabricator.wikimedia.org/P47920 and previous config saved to /var/cache/conftool/dbconfig/20230508-173808-ladsgroup.json
  • 17:38 volans: installed spicerack 7.0.0 on cumin1001
  • 17:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-airflow1001.eqiad.wmnet
  • 17:36 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:36 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 17:35 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2011
  • 17:35 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2011
  • 17:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2011.codfw.wmnet
  • 17:33 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs2011.codfw.wmnet
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P47919 and previous config saved to /var/cache/conftool/dbconfig/20230508-173226-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T335845)', diff saved to https://phabricator.wikimedia.org/P47918 and previous config saved to /var/cache/conftool/dbconfig/20230508-173152-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 17:31 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 17:31 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS buster
  • 17:29 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v7.0.0
  • 17:29 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v7.0.0
  • 17:28 volans: installed spicerack 7.0.0 on cumin2002
  • 17:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 17:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudswift1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T335845)', diff saved to https://phabricator.wikimedia.org/P47917 and previous config saved to /var/cache/conftool/dbconfig/20230508-171720-ladsgroup.json
  • 17:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2011.codfw.wmnet with OS bullseye
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1214 (T335845)', diff saved to https://phabricator.wikimedia.org/P47916 and previous config saved to /var/cache/conftool/dbconfig/20230508-170902-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 17:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T335845)', diff saved to https://phabricator.wikimedia.org/P47915 and previous config saved to /var/cache/conftool/dbconfig/20230508-170828-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T335845)', diff saved to https://phabricator.wikimedia.org/P47914 and previous config saved to /var/cache/conftool/dbconfig/20230508-170542-ladsgroup.json
  • 16:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 16:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 16:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P47913 and previous config saved to /var/cache/conftool/dbconfig/20230508-165322-ladsgroup.json
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P47912 and previous config saved to /var/cache/conftool/dbconfig/20230508-165036-ladsgroup.json
  • 16:46 volans: uploaded spicerack_7.0.0 to apt.wikimedia.org bullseye-wikimedia
  • 16:39 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:39 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:39 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:38 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P47910 and previous config saved to /var/cache/conftool/dbconfig/20230508-163816-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P47909 and previous config saved to /var/cache/conftool/dbconfig/20230508-163530-ladsgroup.json
  • 16:33 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:33 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:32 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:32 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T335845)', diff saved to https://phabricator.wikimedia.org/P47908 and previous config saved to /var/cache/conftool/dbconfig/20230508-162309-ladsgroup.json
  • 16:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T335845)', diff saved to https://phabricator.wikimedia.org/P47907 and previous config saved to /var/cache/conftool/dbconfig/20230508-162024-ladsgroup.json
  • 16:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T335845)', diff saved to https://phabricator.wikimedia.org/P47906 and previous config saved to /var/cache/conftool/dbconfig/20230508-161313-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T335845)', diff saved to https://phabricator.wikimedia.org/P47905 and previous config saved to /var/cache/conftool/dbconfig/20230508-161258-ladsgroup.json
  • 16:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 16:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 16:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47904 and previous config saved to /var/cache/conftool/dbconfig/20230508-161235-ladsgroup.json
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T335845)', diff saved to https://phabricator.wikimedia.org/P47903 and previous config saved to /var/cache/conftool/dbconfig/20230508-161234-ladsgroup.json
  • 16:11 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
  • 16:02 bking@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=codfw
  • 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P47902 and previous config saved to /var/cache/conftool/dbconfig/20230508-155729-ladsgroup.json
  • 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P47901 and previous config saved to /var/cache/conftool/dbconfig/20230508-155728-ladsgroup.json
  • 15:47 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:46 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P47900 and previous config saved to /var/cache/conftool/dbconfig/20230508-154222-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P47899 and previous config saved to /var/cache/conftool/dbconfig/20230508-154222-ladsgroup.json
  • 15:41 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:39 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:32 sukhe: ns1: remove dns2001, add dns2004 next-hop [ 208.80.153.48 208.80.153.111 208.80.153.10 ]: T335777
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47898 and previous config saved to /var/cache/conftool/dbconfig/20230508-152716-ladsgroup.json
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T335845)', diff saved to https://phabricator.wikimedia.org/P47897 and previous config saved to /var/cache/conftool/dbconfig/20230508-152716-ladsgroup.json
  • 15:25 moritzm: installing grep updates from Bullseye 11.7 point release
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T335845)', diff saved to https://phabricator.wikimedia.org/P47896 and previous config saved to /var/cache/conftool/dbconfig/20230508-151952-ladsgroup.json
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T335845)', diff saved to https://phabricator.wikimedia.org/P47895 and previous config saved to /var/cache/conftool/dbconfig/20230508-151929-ladsgroup.json
  • 15:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47894 and previous config saved to /var/cache/conftool/dbconfig/20230508-151556-ladsgroup.json
  • 15:12 sukhe: [done] homer "cr*-codfw*" commit "Gerrit: 917341 add new DNS host dns2004": T326688
  • 15:09 sukhe: homer "cr*-codfw*" commit "Gerrit: 917341 add new DNS host dns2004"
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P47893 and previous config saved to /var/cache/conftool/dbconfig/20230508-150423-ladsgroup.json
  • 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P47892 and previous config saved to /var/cache/conftool/dbconfig/20230508-150050-ladsgroup.json
  • 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns2004.wikimedia.org
  • 14:57 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for dns2004.wikimedia.org
  • 14:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2004.wikimedia.org with OS bullseye
  • 14:51 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.7 refs T330214
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P47891 and previous config saved to /var/cache/conftool/dbconfig/20230508-144916-ladsgroup.json
  • 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P47890 and previous config saved to /var/cache/conftool/dbconfig/20230508-144544-ladsgroup.json
  • 14:40 brennen: train 1.41.0-wmf.7 (T330213): proceeding to all wikis
  • 14:37 sukhe: sudo cumin -b1 -s1200 'A:cp and A:ulsfo' 'varnish-frontend-restart': T253093
  • 14:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T335845)', diff saved to https://phabricator.wikimedia.org/P47889 and previous config saved to /var/cache/conftool/dbconfig/20230508-143410-ladsgroup.json
  • 14:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
  • 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47888 and previous config saved to /var/cache/conftool/dbconfig/20230508-143038-ladsgroup.json
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T335845)', diff saved to https://phabricator.wikimedia.org/P47887 and previous config saved to /var/cache/conftool/dbconfig/20230508-142543-ladsgroup.json
  • 14:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 14:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T335845)', diff saved to https://phabricator.wikimedia.org/P47886 and previous config saved to /var/cache/conftool/dbconfig/20230508-142520-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47885 and previous config saved to /var/cache/conftool/dbconfig/20230508-142427-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47884 and previous config saved to /var/cache/conftool/dbconfig/20230508-142302-ladsgroup.json
  • 14:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 14:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47883 and previous config saved to /var/cache/conftool/dbconfig/20230508-142237-ladsgroup.json
  • 14:16 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2004.wikimedia.org with OS bullseye
  • 14:14 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P47882 and previous config saved to /var/cache/conftool/dbconfig/20230508-141014-ladsgroup.json
  • 14:09 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-airflow1001.eqiad.wmnet
  • 14:08 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - Add mediawiki.page_outlink_topic_prediction_change stream - T328899 (duration: 06m 54s)
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P47881 and previous config saved to /var/cache/conftool/dbconfig/20230508-140731-ladsgroup.json
  • 13:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P47880 and previous config saved to /var/cache/conftool/dbconfig/20230508-135508-ladsgroup.json
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P47879 and previous config saved to /var/cache/conftool/dbconfig/20230508-135224-ladsgroup.json
  • 13:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
  • 13:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T335845)', diff saved to https://phabricator.wikimedia.org/P47878 and previous config saved to /var/cache/conftool/dbconfig/20230508-134002-ladsgroup.json
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47877 and previous config saved to /var/cache/conftool/dbconfig/20230508-133718-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T335845)', diff saved to https://phabricator.wikimedia.org/P47876 and previous config saved to /var/cache/conftool/dbconfig/20230508-133034-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T335845)', diff saved to https://phabricator.wikimedia.org/P47875 and previous config saved to /var/cache/conftool/dbconfig/20230508-133011-ladsgroup.json
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T335845)', diff saved to https://phabricator.wikimedia.org/P47874 and previous config saved to /var/cache/conftool/dbconfig/20230508-132957-ladsgroup.json
  • 13:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 13:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T335845)', diff saved to https://phabricator.wikimedia.org/P47873 and previous config saved to /var/cache/conftool/dbconfig/20230508-132932-ladsgroup.json
  • 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
  • 13:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P47872 and previous config saved to /var/cache/conftool/dbconfig/20230508-131504-ladsgroup.json
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P47871 and previous config saved to /var/cache/conftool/dbconfig/20230508-131426-ladsgroup.json
  • 13:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P47870 and previous config saved to /var/cache/conftool/dbconfig/20230508-125958-ladsgroup.json
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P47869 and previous config saved to /var/cache/conftool/dbconfig/20230508-125920-ladsgroup.json
  • 12:56 moritzm: installing ruby-rack security updates
  • 12:55 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt
  • 12:55 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt
  • 12:51 moritzm: installing python-django security updates on stretch
  • 12:45 moritzm: installing openvswitch securiy updates
  • 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T335845)', diff saved to https://phabricator.wikimedia.org/P47868 and previous config saved to /var/cache/conftool/dbconfig/20230508-124452-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T335845)', diff saved to https://phabricator.wikimedia.org/P47867 and previous config saved to /var/cache/conftool/dbconfig/20230508-124414-ladsgroup.json
  • 12:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
  • 12:40 topranks: rebooting cloudsw1-b1-codfw for OS upgrade T333316
  • 12:39 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
  • 12:38 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T335845)', diff saved to https://phabricator.wikimedia.org/P47866 and previous config saved to /var/cache/conftool/dbconfig/20230508-123654-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T335845)', diff saved to https://phabricator.wikimedia.org/P47865 and previous config saved to /var/cache/conftool/dbconfig/20230508-123624-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T335845)', diff saved to https://phabricator.wikimedia.org/P47864 and previous config saved to /var/cache/conftool/dbconfig/20230508-123614-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T335845)', diff saved to https://phabricator.wikimedia.org/P47863 and previous config saved to /var/cache/conftool/dbconfig/20230508-123554-ladsgroup.json
  • 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 12:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P47862 and previous config saved to /var/cache/conftool/dbconfig/20230508-122108-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P47861 and previous config saved to /var/cache/conftool/dbconfig/20230508-122048-ladsgroup.json
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2005.codfw.wmnet with OS bullseye
  • 12:06 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2448.codfw.wmnet
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P47860 and previous config saved to /var/cache/conftool/dbconfig/20230508-120602-ladsgroup.json
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P47859 and previous config saved to /var/cache/conftool/dbconfig/20230508-120542-ladsgroup.json
  • 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T335845)', diff saved to https://phabricator.wikimedia.org/P47858 and previous config saved to /var/cache/conftool/dbconfig/20230508-115056-ladsgroup.json
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T335845)', diff saved to https://phabricator.wikimedia.org/P47857 and previous config saved to /var/cache/conftool/dbconfig/20230508-115036-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T335845)', diff saved to https://phabricator.wikimedia.org/P47856 and previous config saved to /var/cache/conftool/dbconfig/20230508-114417-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T335845)', diff saved to https://phabricator.wikimedia.org/P47855 and previous config saved to /var/cache/conftool/dbconfig/20230508-114354-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T335845)', diff saved to https://phabricator.wikimedia.org/P47854 and previous config saved to /var/cache/conftool/dbconfig/20230508-114336-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T335845)', diff saved to https://phabricator.wikimedia.org/P47853 and previous config saved to /var/cache/conftool/dbconfig/20230508-114312-ladsgroup.json
  • 11:41 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bullseye
  • 11:35 daniel@deploy1002: Finished scap: Backport for Enable parser cache warming jobs for parsoid on small wikis (T329366) (duration: 15m 26s)
  • 11:32 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2005.codfw.wmnet with OS bookworm
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P47851 and previous config saved to /var/cache/conftool/dbconfig/20230508-112848-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P47850 and previous config saved to /var/cache/conftool/dbconfig/20230508-112805-ladsgroup.json
  • 11:21 daniel@deploy1002: daniel: Backport for Enable parser cache warming jobs for parsoid on small wikis (T329366) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 11:20 daniel@deploy1002: Started scap: Backport for Enable parser cache warming jobs for parsoid on small wikis (T329366)
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P47849 and previous config saved to /var/cache/conftool/dbconfig/20230508-111342-ladsgroup.json
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P47848 and previous config saved to /var/cache/conftool/dbconfig/20230508-111259-ladsgroup.json
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1113 from dbctl T336029', diff saved to https://phabricator.wikimedia.org/P47847 and previous config saved to /var/cache/conftool/dbconfig/20230508-111113-marostegui.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47846 and previous config saved to /var/cache/conftool/dbconfig/20230508-110812-root.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47845 and previous config saved to /var/cache/conftool/dbconfig/20230508-110803-root.json
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47844 and previous config saved to /var/cache/conftool/dbconfig/20230508-110756-root.json
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47843 and previous config saved to /var/cache/conftool/dbconfig/20230508-110755-root.json
  • 11:04 duesen: conflig deployment failed because gitlab is down. Prod is out of sync with gerrit, and deploy1002 is in sync with gerrit. Will come back to thin in an hour.
  • 10:59 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T335845)', diff saved to https://phabricator.wikimedia.org/P47842 and previous config saved to /var/cache/conftool/dbconfig/20230508-105835-ladsgroup.json
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T335845)', diff saved to https://phabricator.wikimedia.org/P47841 and previous config saved to /var/cache/conftool/dbconfig/20230508-105753-ladsgroup.json
  • 10:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T320967)
  • 10:56 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
  • 10:54 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T320967)
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47840 and previous config saved to /var/cache/conftool/dbconfig/20230508-105307-root.json
  • 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47839 and previous config saved to /var/cache/conftool/dbconfig/20230508-105258-root.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47838 and previous config saved to /var/cache/conftool/dbconfig/20230508-105252-root.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47837 and previous config saved to /var/cache/conftool/dbconfig/20230508-105250-root.json
  • 10:52 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host testvm2005.codfw.wmnet with OS bookworm
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T335845)', diff saved to https://phabricator.wikimedia.org/P47836 and previous config saved to /var/cache/conftool/dbconfig/20230508-105215-ladsgroup.json
  • 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T335845)', diff saved to https://phabricator.wikimedia.org/P47835 and previous config saved to /var/cache/conftool/dbconfig/20230508-105141-ladsgroup.json
  • 10:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
  • 10:51 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 10:51 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 10:50 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-replica.wikimedia.org on all recursors
  • 10:50 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache gitlab-replica.wikimedia.org on all recursors
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T335845)', diff saved to https://phabricator.wikimedia.org/P47834 and previous config saved to /var/cache/conftool/dbconfig/20230508-105032-ladsgroup.json
  • 10:50 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab.wikimedia.org on all recursors
  • 10:50 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache gitlab.wikimedia.org on all recursors
  • 10:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47833 and previous config saved to /var/cache/conftool/dbconfig/20230508-105007-ladsgroup.json
  • 10:47 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 10:47 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 10:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T320967)
  • 10:45 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T320967)
  • 10:44 daniel@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/local/bin/update-mediawiki-tools-release' returned non-zero exit status 1. (duration: 00m 05s)
  • 10:44 daniel@deploy1002: Started scap: Backport for Enable parser cache warming jobs for parsoid on small wikis (T329366)
  • 10:41 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47832 and previous config saved to /var/cache/conftool/dbconfig/20230508-103802-root.json
  • 10:37 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47831 and previous config saved to /var/cache/conftool/dbconfig/20230508-103754-root.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47830 and previous config saved to /var/cache/conftool/dbconfig/20230508-103747-root.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47829 and previous config saved to /var/cache/conftool/dbconfig/20230508-103745-root.json
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P47828 and previous config saved to /var/cache/conftool/dbconfig/20230508-103634-ladsgroup.json
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P47827 and previous config saved to /var/cache/conftool/dbconfig/20230508-103501-ladsgroup.json
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
  • 10:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
  • 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host testvm2005.codfw.wmnet with OS bookworm
  • 10:27 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
  • 10:24 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47826 and previous config saved to /var/cache/conftool/dbconfig/20230508-102258-root.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47825 and previous config saved to /var/cache/conftool/dbconfig/20230508-102249-root.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47824 and previous config saved to /var/cache/conftool/dbconfig/20230508-102242-root.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47823 and previous config saved to /var/cache/conftool/dbconfig/20230508-102240-root.json
  • 10:22 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P47822 and previous config saved to /var/cache/conftool/dbconfig/20230508-102128-ladsgroup.json
  • 10:21 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P47821 and previous config saved to /var/cache/conftool/dbconfig/20230508-101955-ladsgroup.json
  • 10:18 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47820 and previous config saved to /var/cache/conftool/dbconfig/20230508-100753-root.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47819 and previous config saved to /var/cache/conftool/dbconfig/20230508-100744-root.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47818 and previous config saved to /var/cache/conftool/dbconfig/20230508-100737-root.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47817 and previous config saved to /var/cache/conftool/dbconfig/20230508-100736-root.json
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T335845)', diff saved to https://phabricator.wikimedia.org/P47816 and previous config saved to /var/cache/conftool/dbconfig/20230508-100622-ladsgroup.json
  • 10:04 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47815 and previous config saved to /var/cache/conftool/dbconfig/20230508-100449-ladsgroup.json
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T335845)', diff saved to https://phabricator.wikimedia.org/P47814 and previous config saved to /var/cache/conftool/dbconfig/20230508-100003-ladsgroup.json
  • 09:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P47813 and previous config saved to /var/cache/conftool/dbconfig/20230508-095928-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47812 and previous config saved to /var/cache/conftool/dbconfig/20230508-095724-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T335845)', diff saved to https://phabricator.wikimedia.org/P47811 and previous config saved to /var/cache/conftool/dbconfig/20230508-095659-ladsgroup.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47810 and previous config saved to /var/cache/conftool/dbconfig/20230508-095248-root.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47809 and previous config saved to /var/cache/conftool/dbconfig/20230508-095240-root.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47808 and previous config saved to /var/cache/conftool/dbconfig/20230508-095233-root.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47807 and previous config saved to /var/cache/conftool/dbconfig/20230508-095231-root.json
  • 09:48 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P47806 and previous config saved to /var/cache/conftool/dbconfig/20230508-094422-ladsgroup.json
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P47805 and previous config saved to /var/cache/conftool/dbconfig/20230508-094153-ladsgroup.json
  • 09:40 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:40 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:40 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:39 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:39 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:39 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:38 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:38 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47804 and previous config saved to /var/cache/conftool/dbconfig/20230508-093743-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47803 and previous config saved to /var/cache/conftool/dbconfig/20230508-093735-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47802 and previous config saved to /var/cache/conftool/dbconfig/20230508-093728-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47801 and previous config saved to /var/cache/conftool/dbconfig/20230508-093726-root.json
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P47800 and previous config saved to /var/cache/conftool/dbconfig/20230508-092916-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P47799 and previous config saved to /var/cache/conftool/dbconfig/20230508-092647-ladsgroup.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2022 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47798 and previous config saved to /var/cache/conftool/dbconfig/20230508-092232-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47797 and previous config saved to /var/cache/conftool/dbconfig/20230508-092223-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P47796 and previous config saved to /var/cache/conftool/dbconfig/20230508-092221-root.json
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P47794 and previous config saved to /var/cache/conftool/dbconfig/20230508-091408-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T335845)', diff saved to https://phabricator.wikimedia.org/P47793 and previous config saved to /var/cache/conftool/dbconfig/20230508-091140-ladsgroup.json
  • 09:10 ladsgroup@deploy1002: Finished scap: Backport for Set externallinks migration to read new on mediawiki.org and fawikiquote (T335343) (duration: 14m 04s)
  • 09:05 eoghan@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
  • 09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T335845)', diff saved to https://phabricator.wikimedia.org/P47792 and previous config saved to /var/cache/conftool/dbconfig/20230508-090521-ladsgroup.json
  • 09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T335845)', diff saved to https://phabricator.wikimedia.org/P47791 and previous config saved to /var/cache/conftool/dbconfig/20230508-090456-ladsgroup.json
  • 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 08:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 08:57 ladsgroup@deploy1002: ladsgroup: Backport for Set externallinks migration to read new on mediawiki.org and fawikiquote (T335343) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:56 ladsgroup@deploy1002: Started scap: Backport for Set externallinks migration to read new on mediawiki.org and fawikiquote (T335343)
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 es1025 es2025 es2022 for reboots', diff saved to https://phabricator.wikimedia.org/P47790 and previous config saved to /var/cache/conftool/dbconfig/20230508-085435-root.json
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P47789 and previous config saved to /var/cache/conftool/dbconfig/20230508-084950-ladsgroup.json
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:43 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
  • 08:40 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" (duration: 13m 18s)
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P47788 and previous config saved to /var/cache/conftool/dbconfig/20230508-083444-ladsgroup.json
  • 08:29 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 08:28 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 08:27 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master"
  • 08:27 vgutierrez: HAProxy updated to 2.6.13 on cp1077 and cp1085 - T334448
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T335845)', diff saved to https://phabricator.wikimedia.org/P47787 and previous config saved to /var/cache/conftool/dbconfig/20230508-081937-ladsgroup.json
  • 08:18 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
  • 08:17 marostegui: Failover m3-master from dbproxy1020 to dbproxy1016
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T335845)', diff saved to https://phabricator.wikimedia.org/P47786 and previous config saved to /var/cache/conftool/dbconfig/20230508-081415-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P47785 and previous config saved to /var/cache/conftool/dbconfig/20230508-081353-ladsgroup.json
  • 08:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:13 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master (duration: 15m 27s)
  • 07:59 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 07:59 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 master synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:57 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master
  • 07:54 _joe_: restarting varnish-frontend on cp5029, last host in eqsin/upload to be restarted
  • 07:53 vgutierrez: fetch HAProxy 2.6.13 on thirdparty/haproxy2.6 (apt.wm.o) - T334448
  • 07:51 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
  • 07:29 _joe_: restarting varnish-frontend on upload eqsin
  • 07:25 _joe_: running restart-cdn on cp5030
  • 07:22 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 07:22 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 07:19 moritzm: updated bookworm installer to RC2 T330495
  • 07:11 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host netflow2003.codfw.wmnet with OS bookworm
  • 07:05 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 07:02 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 06:50 moritzm: bounce ferm on aux-k8s-ctrl1001
  • 06:49 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host netflow2003.codfw.wmnet with OS bookworm
  • 06:48 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 06:48 kart_: Deployed MinT to the production (T331505)
  • 06:47 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 06:47 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:44 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:43 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:40 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:55 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master" (duration: 27m 46s)
  • 05:46 phedenskog@deploy1002: Finished deploy [performance/navtiming@9b22d3b]: Measure largest contentful paint element type (duration: 00m 05s)
  • 05:46 phedenskog@deploy1002: Started deploy [performance/navtiming@9b22d3b]: Measure largest contentful paint element type
  • 05:42 marostegui@deploy1002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 05:28 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master"
  • 05:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on pc1014.eqiad.wmnet with reason: Maintenance
  • 05:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on pc1014.eqiad.wmnet with reason: Maintenance
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6) T336029', diff saved to https://phabricator.wikimedia.org/P47783 and previous config saved to /var/cache/conftool/dbconfig/20230508-051036-root.json
  • 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbproxy1013.eqiad.wmnet with reason: Maintenance
  • 05:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on dbproxy1013.eqiad.wmnet with reason: Maintenance
  • 04:54 marostegui: Deploy schema change on x1 eqiad wikishared with replication dbmaint T335834

2023-05-07

  • 00:54 sukhe: restart haproxy on cp1087: T334448

2023-05-06

  • 08:51 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 08:03 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 07:50 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 07:44 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 07:07 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 06:50 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade

2023-05-05

  • 23:24 tzatziki: removing emails from 230 users per self-requests
  • 18:57 brennen@deploy1002: Finished scap: Backport for Revert "api: Use RevisionStore::newRevisionsFromBatch to fetch revision records" (T336008 T336022) (duration: 14m 21s)
  • 18:44 brennen@deploy1002: umherirrender and brennen: Backport for Revert "api: Use RevisionStore::newRevisionsFromBatch to fetch revision records" (T336008 T336022) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 18:42 brennen@deploy1002: Started scap: Backport for Revert "api: Use RevisionStore::newRevisionsFromBatch to fetch revision records" (T336008 T336022)
  • 18:25 brennen: train 1.41.0-wmf.7 (T330213): trying revert for T336008, T336022
  • 17:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2011.codfw.wmnet with OS bullseye
  • 17:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 17:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 17:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 17:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:44 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 16:35 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 16:35 btullis@cumin1001: Added views for new wiki: newiki T334041
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:28 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1023.eqiad.wmnet
  • 16:20 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:20 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1023.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 16:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1024.eqiad.wmnet
  • 16:17 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:17 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1023.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 16:16 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 16:15 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 16:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:10 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 16:10 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1024.eqiad.wmnet
  • 16:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:08 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1023.eqiad.wmnet
  • 16:06 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 16:06 btullis@cumin1001: Added views for new wiki: zhwiki T334041
  • 16:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:00 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 16:00 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 15:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt1020.eqiad.wmnet
  • 15:51 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:50 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 15:48 btullis@cumin1001: END (PASS) - Cookbook sre.presto.reboot-workers (exit_code=0) for Presto analytics cluster: Reboot Presto nodes
  • 15:42 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1020.eqiad.wmnet
  • 15:41 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 15:41 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt1020.eqiad.wmnet
  • 15:41 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:41 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1020.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 15:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1019.eqiad.wmnet
  • 15:40 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:40 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1020.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 15:39 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 15:37 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 15:27 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1019.eqiad.wmnet
  • 15:27 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt1020.eqiad.wmnet
  • 15:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T335845)', diff saved to https://phabricator.wikimedia.org/P47778 and previous config saved to /var/cache/conftool/dbconfig/20230505-152222-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P47777 and previous config saved to /var/cache/conftool/dbconfig/20230505-150716-ladsgroup.json
  • 15:06 mforns@deploy1002: Finished deploy [airflow-dags/analytics@11fa4e1]: (no justification provided) (duration: 00m 13s)
  • 15:06 mforns@deploy1002: Started deploy [airflow-dags/analytics@11fa4e1]: (no justification provided)
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P47776 and previous config saved to /var/cache/conftool/dbconfig/20230505-145209-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T335845)', diff saved to https://phabricator.wikimedia.org/P47774 and previous config saved to /var/cache/conftool/dbconfig/20230505-143703-ladsgroup.json
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T335845)', diff saved to https://phabricator.wikimedia.org/P47773 and previous config saved to /var/cache/conftool/dbconfig/20230505-142940-ladsgroup.json
  • 14:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 14:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T335845)', diff saved to https://phabricator.wikimedia.org/P47772 and previous config saved to /var/cache/conftool/dbconfig/20230505-142917-ladsgroup.json
  • 14:26 btullis@cumin1001: START - Cookbook sre.presto.reboot-workers for Presto analytics cluster: Reboot Presto nodes
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P47771 and previous config saved to /var/cache/conftool/dbconfig/20230505-141410-ladsgroup.json
  • 14:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 14:04 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 14:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host lvs2011.codfw.wmnet
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P47770 and previous config saved to /var/cache/conftool/dbconfig/20230505-135904-ladsgroup.json
  • 13:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 13:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 13:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 13:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: New kernel, T335835
  • 13:48 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-worker1002.eqiad.wmnet with reason: New kernel, T335835
  • 13:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-worker1001.eqiad.wmnet with reason: New kernel, T335835
  • 13:48 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-worker1001.eqiad.wmnet with reason: New kernel, T335835
  • 13:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1003.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1003.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-ctrl1002.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-ctrl1001.eqiad.wmnet with reason: New kernel, T335835
  • 13:47 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-ctrl1001.eqiad.wmnet with reason: New kernel, T335835
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T335845)', diff saved to https://phabricator.wikimedia.org/P47769 and previous config saved to /var/cache/conftool/dbconfig/20230505-134358-ladsgroup.json
  • 13:39 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lists1003.wikimedia.org with reason: New kernel, T335835
  • 13:39 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lists1003.wikimedia.org with reason: New kernel, T335835
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T335845)', diff saved to https://phabricator.wikimedia.org/P47768 and previous config saved to /var/cache/conftool/dbconfig/20230505-133631-ladsgroup.json
  • 13:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 13:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T335845)', diff saved to https://phabricator.wikimedia.org/P47767 and previous config saved to /var/cache/conftool/dbconfig/20230505-133556-ladsgroup.json
  • 13:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1005.eqiad.wmnet
  • 13:30 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New kernel, T335835
  • 13:30 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New kernel, T335835
  • 13:26 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New kernel, T335835
  • 13:25 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New kernel, T335835
  • 13:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1005.eqiad.wmnet
  • 13:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1004.eqiad.wmnet
  • 13:23 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New kernel, T335835
  • 13:23 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New kernel, T335835
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P47766 and previous config saved to /var/cache/conftool/dbconfig/20230505-132050-ladsgroup.json
  • 13:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1004.eqiad.wmnet
  • 13:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet
  • 13:13 andrewbogott: rebooting cloudbackup2001.codfw.wmnet, unresponsive
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P47765 and previous config saved to /var/cache/conftool/dbconfig/20230505-130544-ladsgroup.json
  • 13:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet
  • 13:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
  • 12:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
  • 12:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T335845)', diff saved to https://phabricator.wikimedia.org/P47764 and previous config saved to /var/cache/conftool/dbconfig/20230505-125038-ladsgroup.json
  • 12:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T335845)', diff saved to https://phabricator.wikimedia.org/P47763 and previous config saved to /var/cache/conftool/dbconfig/20230505-124412-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47762 and previous config saved to /var/cache/conftool/dbconfig/20230505-124349-ladsgroup.json
  • 12:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P47761 and previous config saved to /var/cache/conftool/dbconfig/20230505-122843-ladsgroup.json
  • 12:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P47760 and previous config saved to /var/cache/conftool/dbconfig/20230505-121336-ladsgroup.json
  • 12:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1001.eqiad.wmnet
  • 11:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-mariadb1001.eqiad.wmnet
  • 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47759 and previous config saved to /var/cache/conftool/dbconfig/20230505-115830-ladsgroup.json
  • 11:52 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47758 and previous config saved to /var/cache/conftool/dbconfig/20230505-115126-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P47757 and previous config saved to /var/cache/conftool/dbconfig/20230505-112649-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P47756 and previous config saved to /var/cache/conftool/dbconfig/20230505-112605-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P47755 and previous config saved to /var/cache/conftool/dbconfig/20230505-111145-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P47754 and previous config saved to /var/cache/conftool/dbconfig/20230505-111100-ladsgroup.json
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P47753 and previous config saved to /var/cache/conftool/dbconfig/20230505-105640-ladsgroup.json
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P47752 and previous config saved to /var/cache/conftool/dbconfig/20230505-105555-ladsgroup.json
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P47751 and previous config saved to /var/cache/conftool/dbconfig/20230505-104135-ladsgroup.json
  • 10:41 moritzm: installing wireshark security updates
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P47750 and previous config saved to /var/cache/conftool/dbconfig/20230505-104050-ladsgroup.json
  • 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1170.eqiad.wmnet with reason: Host sad (T336033)
  • 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1170.eqiad.wmnet with reason: Host sad (T336033)
  • 09:14 Amir1: power cycled db1170\
  • 09:10 marostegui: Failover m2-master from dbproxy1013 to dbproxy1015
  • 09:08 hnowlan@deploy1002: Finished deploy [restbase/deploy@8aba801]: deploying to host missing from configs (duration: 01m 22s)
  • 09:06 hnowlan@deploy1002: Started deploy [restbase/deploy@8aba801]: deploying to host missing from configs
  • 08:58 XioNoX: deploy CR914772 on all hosts running Bird
  • 08:15 godog: delete wal and chunks_head from prometheus5002 and prometheus4002 to let prometheus start back up and not crashloop - T309979
  • 08:07 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host netflow2003.codfw.wmnet with OS bookworm
  • 08:05 hashar@deploy1002: Finished deploy [integration/docroot@78e6f40]: build: Updating eslint-config-wikimedia to 0.25.0 (duration: 00m 13s)
  • 08:04 hashar@deploy1002: Started deploy [integration/docroot@78e6f40]: build: Updating eslint-config-wikimedia to 0.25.0
  • 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 06:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 06:51 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host netflow2003.codfw.wmnet with OS bookworm
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow2003.codfw.wmnet
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow2003.codfw.wmnet - jmm@cumin2002"
  • 06:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 06:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow2003.codfw.wmnet - jmm@cumin2002"
  • 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet
  • 06:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet
  • 06:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
  • 06:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136907
  • 06:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow2003.codfw.wmnet on all recursors
  • 06:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow2003.codfw.wmnet on all recursors
  • 06:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow2003.codfw.wmnet - jmm@cumin2002"
  • 06:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
  • 06:38 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow2003.codfw.wmnet - jmm@cumin2002"
  • 06:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136907
  • 06:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host netflow2003.codfw.wmnet
  • 06:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
  • 06:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
  • 06:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
  • 06:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
  • 06:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
  • 06:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
  • 05:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 05:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47748 and previous config saved to /var/cache/conftool/dbconfig/20230505-050007-ladsgroup.json
  • 04:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P47747 and previous config saved to /var/cache/conftool/dbconfig/20230505-044500-ladsgroup.json
  • 04:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P47746 and previous config saved to /var/cache/conftool/dbconfig/20230505-042954-ladsgroup.json
  • 04:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 04:18 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 04:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 04:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47745 and previous config saved to /var/cache/conftool/dbconfig/20230505-041448-ladsgroup.json
  • 04:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T335845)', diff saved to https://phabricator.wikimedia.org/P47744 and previous config saved to /var/cache/conftool/dbconfig/20230505-040837-ladsgroup.json
  • 04:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 04:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 04:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47743 and previous config saved to /var/cache/conftool/dbconfig/20230505-040812-ladsgroup.json
  • 04:04 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 03:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P47742 and previous config saved to /var/cache/conftool/dbconfig/20230505-035306-ladsgroup.json
  • 03:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P47741 and previous config saved to /var/cache/conftool/dbconfig/20230505-033800-ladsgroup.json
  • 03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47740 and previous config saved to /var/cache/conftool/dbconfig/20230505-032253-ladsgroup.json
  • 03:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47739 and previous config saved to /var/cache/conftool/dbconfig/20230505-031637-ladsgroup.json
  • 03:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P47738 and previous config saved to /var/cache/conftool/dbconfig/20230505-030130-ladsgroup.json
  • 02:54 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 02:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P47737 and previous config saved to /var/cache/conftool/dbconfig/20230505-024624-ladsgroup.json
  • 02:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 02:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47736 and previous config saved to /var/cache/conftool/dbconfig/20230505-023118-ladsgroup.json
  • 02:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 02:30 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 02:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 02:29 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47735 and previous config saved to /var/cache/conftool/dbconfig/20230505-022510-ladsgroup.json
  • 02:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T335845)', diff saved to https://phabricator.wikimedia.org/P47734 and previous config saved to /var/cache/conftool/dbconfig/20230505-022446-ladsgroup.json
  • 02:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 02:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 02:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T335845)', diff saved to https://phabricator.wikimedia.org/P47733 and previous config saved to /var/cache/conftool/dbconfig/20230505-022421-ladsgroup.json
  • 02:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P47732 and previous config saved to /var/cache/conftool/dbconfig/20230505-020915-ladsgroup.json
  • 01:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P47731 and previous config saved to /var/cache/conftool/dbconfig/20230505-015409-ladsgroup.json
  • 01:49 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 01:45 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus6002.drmrs.wmnet
  • 01:41 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 01:40 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 01:39 denisse@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM prometheus6002.drmrs.wmnet
  • 01:39 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus5002.eqsin.wmnet
  • 01:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T335845)', diff saved to https://phabricator.wikimedia.org/P47730 and previous config saved to /var/cache/conftool/dbconfig/20230505-013903-ladsgroup.json
  • 01:32 denisse@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM prometheus5002.eqsin.wmnet
  • 01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T335845)', diff saved to https://phabricator.wikimedia.org/P47729 and previous config saved to /var/cache/conftool/dbconfig/20230505-013232-ladsgroup.json
  • 01:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 01:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 01:32 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus4002.ulsfo.wmnet
  • 01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T335845)', diff saved to https://phabricator.wikimedia.org/P47728 and previous config saved to /var/cache/conftool/dbconfig/20230505-013206-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T335845)', diff saved to https://phabricator.wikimedia.org/P47727 and previous config saved to /var/cache/conftool/dbconfig/20230505-013108-ladsgroup.json
  • 01:31 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 01:30 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 01:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T335845)', diff saved to https://phabricator.wikimedia.org/P47726 and previous config saved to /var/cache/conftool/dbconfig/20230505-012950-ladsgroup.json
  • 01:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 01:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 01:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T335845)', diff saved to https://phabricator.wikimedia.org/P47725 and previous config saved to /var/cache/conftool/dbconfig/20230505-012927-ladsgroup.json
  • 01:26 denisse@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM prometheus4002.ulsfo.wmnet
  • 01:25 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM prometheus3002.esams.wmnet
  • 01:21 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 01:20 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 01:18 denisse@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM prometheus3002.esams.wmnet
  • 01:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P47724 and previous config saved to /var/cache/conftool/dbconfig/20230505-011700-ladsgroup.json
  • 01:16 denisse@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P47723 and previous config saved to /var/cache/conftool/dbconfig/20230505-011421-ladsgroup.json
  • 01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P47722 and previous config saved to /var/cache/conftool/dbconfig/20230505-010154-ladsgroup.json
  • 00:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P47721 and previous config saved to /var/cache/conftool/dbconfig/20230505-005914-ladsgroup.json
  • 00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T335845)', diff saved to https://phabricator.wikimedia.org/P47720 and previous config saved to /var/cache/conftool/dbconfig/20230505-004648-ladsgroup.json
  • 00:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T335845)', diff saved to https://phabricator.wikimedia.org/P47719 and previous config saved to /var/cache/conftool/dbconfig/20230505-004408-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T335845)', diff saved to https://phabricator.wikimedia.org/P47718 and previous config saved to /var/cache/conftool/dbconfig/20230505-003914-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T335845)', diff saved to https://phabricator.wikimedia.org/P47717 and previous config saved to /var/cache/conftool/dbconfig/20230505-003845-ladsgroup.json
  • 00:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T335845)', diff saved to https://phabricator.wikimedia.org/P47716 and previous config saved to /var/cache/conftool/dbconfig/20230505-003749-ladsgroup.json
  • 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 00:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T335845)', diff saved to https://phabricator.wikimedia.org/P47715 and previous config saved to /var/cache/conftool/dbconfig/20230505-003359-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P47714 and previous config saved to /var/cache/conftool/dbconfig/20230505-002339-ladsgroup.json
  • 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P47713 and previous config saved to /var/cache/conftool/dbconfig/20230505-001853-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P47712 and previous config saved to /var/cache/conftool/dbconfig/20230505-000832-ladsgroup.json
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P47711 and previous config saved to /var/cache/conftool/dbconfig/20230505-000346-ladsgroup.json

2023-05-04

  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T335845)', diff saved to https://phabricator.wikimedia.org/P47710 and previous config saved to /var/cache/conftool/dbconfig/20230504-235326-ladsgroup.json
  • 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T335845)', diff saved to https://phabricator.wikimedia.org/P47709 and previous config saved to /var/cache/conftool/dbconfig/20230504-234840-ladsgroup.json
  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T335845)', diff saved to https://phabricator.wikimedia.org/P47708 and previous config saved to /var/cache/conftool/dbconfig/20230504-234544-ladsgroup.json
  • 23:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 23:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T335845)', diff saved to https://phabricator.wikimedia.org/P47707 and previous config saved to /var/cache/conftool/dbconfig/20230504-234520-ladsgroup.json
  • 23:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T335845)', diff saved to https://phabricator.wikimedia.org/P47706 and previous config saved to /var/cache/conftool/dbconfig/20230504-234330-ladsgroup.json
  • 23:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 23:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 23:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47705 and previous config saved to /var/cache/conftool/dbconfig/20230504-234306-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P47704 and previous config saved to /var/cache/conftool/dbconfig/20230504-233013-ladsgroup.json
  • 23:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P47703 and previous config saved to /var/cache/conftool/dbconfig/20230504-232800-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P47702 and previous config saved to /var/cache/conftool/dbconfig/20230504-231507-ladsgroup.json
  • 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P47701 and previous config saved to /var/cache/conftool/dbconfig/20230504-231254-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T335845)', diff saved to https://phabricator.wikimedia.org/P47700 and previous config saved to /var/cache/conftool/dbconfig/20230504-230001-ladsgroup.json
  • 22:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47699 and previous config saved to /var/cache/conftool/dbconfig/20230504-225747-ladsgroup.json
  • 22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T335845)', diff saved to https://phabricator.wikimedia.org/P47698 and previous config saved to /var/cache/conftool/dbconfig/20230504-225336-ladsgroup.json
  • 22:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47697 and previous config saved to /var/cache/conftool/dbconfig/20230504-225013-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 22:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T335838)', diff saved to https://phabricator.wikimedia.org/P47696 and previous config saved to /var/cache/conftool/dbconfig/20230504-224646-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P47695 and previous config saved to /var/cache/conftool/dbconfig/20230504-223139-ladsgroup.json
  • 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host lvs2011.codfw.wmnet
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P47694 and previous config saved to /var/cache/conftool/dbconfig/20230504-221633-ladsgroup.json
  • 22:12 brennen@deploy1002: Finished scap: Backport for api: Use Status::isGood in ApiQueryRevisionsBase::getRevisionRecords (T336008) (duration: 09m 07s)
  • 22:04 brennen@deploy1002: brennen: Backport for api: Use Status::isGood in ApiQueryRevisionsBase::getRevisionRecords (T336008) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:03 brennen@deploy1002: Started scap: Backport for api: Use Status::isGood in ApiQueryRevisionsBase::getRevisionRecords (T336008)
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T335838)', diff saved to https://phabricator.wikimedia.org/P47693 and previous config saved to /var/cache/conftool/dbconfig/20230504-220127-ladsgroup.json
  • 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T335838)', diff saved to https://phabricator.wikimedia.org/P47692 and previous config saved to /var/cache/conftool/dbconfig/20230504-215511-ladsgroup.json
  • 21:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T335838)', diff saved to https://phabricator.wikimedia.org/P47691 and previous config saved to /var/cache/conftool/dbconfig/20230504-215447-ladsgroup.json
  • 21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P47690 and previous config saved to /var/cache/conftool/dbconfig/20230504-213941-ladsgroup.json
  • 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P47689 and previous config saved to /var/cache/conftool/dbconfig/20230504-212434-ladsgroup.json
  • 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T335838)', diff saved to https://phabricator.wikimedia.org/P47688 and previous config saved to /var/cache/conftool/dbconfig/20230504-210928-ladsgroup.json
  • 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47687 and previous config saved to /var/cache/conftool/dbconfig/20230504-210513-ladsgroup.json
  • 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T335838)', diff saved to https://phabricator.wikimedia.org/P47686 and previous config saved to /var/cache/conftool/dbconfig/20230504-210057-ladsgroup.json
  • 21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T335838)', diff saved to https://phabricator.wikimedia.org/P47685 and previous config saved to /var/cache/conftool/dbconfig/20230504-210033-ladsgroup.json
  • 20:57 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.7 refs T330213 (duration: 06m 02s)
  • 20:51 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.7 refs T330213
  • 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P47684 and previous config saved to /var/cache/conftool/dbconfig/20230504-205007-ladsgroup.json
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P47683 and previous config saved to /var/cache/conftool/dbconfig/20230504-204527-ladsgroup.json
  • 20:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P47682 and previous config saved to /var/cache/conftool/dbconfig/20230504-203501-ladsgroup.json
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P47681 and previous config saved to /var/cache/conftool/dbconfig/20230504-203021-ladsgroup.json
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47680 and previous config saved to /var/cache/conftool/dbconfig/20230504-201955-ladsgroup.json
  • 20:19 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:19 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:17 brennen@deploy1002: Finished scap: Backport for Fix file page integration (T335997) (duration: 10m 50s)
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T335838)', diff saved to https://phabricator.wikimedia.org/P47679 and previous config saved to /var/cache/conftool/dbconfig/20230504-201514-ladsgroup.json
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T335845)', diff saved to https://phabricator.wikimedia.org/P47678 and previous config saved to /var/cache/conftool/dbconfig/20230504-201332-ladsgroup.json
  • 20:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 20:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47677 and previous config saved to /var/cache/conftool/dbconfig/20230504-201306-ladsgroup.json
  • 20:08 brennen@deploy1002: brennen and jdlrobson: Backport for Fix file page integration (T335997) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:06 brennen@deploy1002: Started scap: Backport for Fix file page integration (T335997)
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T335838)', diff saved to https://phabricator.wikimedia.org/P47676 and previous config saved to /var/cache/conftool/dbconfig/20230504-200644-ladsgroup.json
  • 20:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 20:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 20:06 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:06 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:03 mutante: miscweb1003 - rebooting
  • 20:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on miscweb1003.eqiad.wmnet with reason: reboot
  • 20:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on miscweb1003.eqiad.wmnet with reason: reboot
  • 20:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47675 and previous config saved to /var/cache/conftool/dbconfig/20230504-200141-ladsgroup.json
  • 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47674 and previous config saved to /var/cache/conftool/dbconfig/20230504-200131-ladsgroup.json
  • 20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on people2002.codfw.wmnet with reason: maintenance upgrade
  • 20:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on people2002.codfw.wmnet with reason: maintenance upgrade
  • 20:00 mutante: people2002 (people.wikimedia.org) reboot, <1 min downtime
  • 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P47673 and previous config saved to /var/cache/conftool/dbconfig/20230504-195800-ladsgroup.json
  • 19:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
  • 19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P47672 and previous config saved to /var/cache/conftool/dbconfig/20230504-194635-ladsgroup.json
  • 19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P47671 and previous config saved to /var/cache/conftool/dbconfig/20230504-194624-ladsgroup.json
  • 19:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P47670 and previous config saved to /var/cache/conftool/dbconfig/20230504-194254-ladsgroup.json
  • 19:42 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
  • 19:38 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
  • 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P47669 and previous config saved to /var/cache/conftool/dbconfig/20230504-193129-ladsgroup.json
  • 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P47668 and previous config saved to /var/cache/conftool/dbconfig/20230504-193118-ladsgroup.json
  • 19:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
  • 19:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47667 and previous config saved to /var/cache/conftool/dbconfig/20230504-192747-ladsgroup.json
  • 19:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
  • 19:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47666 and previous config saved to /var/cache/conftool/dbconfig/20230504-191623-ladsgroup.json
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47665 and previous config saved to /var/cache/conftool/dbconfig/20230504-191612-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T335845)', diff saved to https://phabricator.wikimedia.org/P47664 and previous config saved to /var/cache/conftool/dbconfig/20230504-191528-ladsgroup.json
  • 19:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet
  • 19:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
  • 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47663 and previous config saved to /var/cache/conftool/dbconfig/20230504-191001-ladsgroup.json
  • 19:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T335838)', diff saved to https://phabricator.wikimedia.org/P47662 and previous config saved to /var/cache/conftool/dbconfig/20230504-190937-ladsgroup.json
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47661 and previous config saved to /var/cache/conftool/dbconfig/20230504-190757-ladsgroup.json
  • 19:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet
  • 19:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet
  • 19:02 fab@deploy1002: Finished deploy [airflow-dags/research@88ebdf7]: (no justification provided) (duration: 00m 03s)
  • 19:02 fab@deploy1002: Started deploy [airflow-dags/research@88ebdf7]: (no justification provided)
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P47660 and previous config saved to /var/cache/conftool/dbconfig/20230504-190022-ladsgroup.json
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P47659 and previous config saved to /var/cache/conftool/dbconfig/20230504-185431-ladsgroup.json
  • 18:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P47658 and previous config saved to /var/cache/conftool/dbconfig/20230504-185250-ladsgroup.json
  • 18:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
  • 18:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P47657 and previous config saved to /var/cache/conftool/dbconfig/20230504-184516-ladsgroup.json
  • 18:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P47656 and previous config saved to /var/cache/conftool/dbconfig/20230504-183925-ladsgroup.json
  • 18:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P47655 and previous config saved to /var/cache/conftool/dbconfig/20230504-183744-ladsgroup.json
  • 18:37 fab@deploy1002: Finished deploy [airflow-dags/research@88ebdf7]: (no justification provided) (duration: 00m 09s)
  • 18:37 fab@deploy1002: Started deploy [airflow-dags/research@88ebdf7]: (no justification provided)
  • 18:31 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet
  • 18:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1050.eqiad.wmnet
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T335845)', diff saved to https://phabricator.wikimedia.org/P47654 and previous config saved to /var/cache/conftool/dbconfig/20230504-183010-ladsgroup.json
  • 18:28 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 18:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1050.eqiad.wmnet
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T335838)', diff saved to https://phabricator.wikimedia.org/P47653 and previous config saved to /var/cache/conftool/dbconfig/20230504-182418-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T335845)', diff saved to https://phabricator.wikimedia.org/P47652 and previous config saved to /var/cache/conftool/dbconfig/20230504-182301-ladsgroup.json
  • 18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47651 and previous config saved to /var/cache/conftool/dbconfig/20230504-182238-ladsgroup.json
  • 18:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T335845)', diff saved to https://phabricator.wikimedia.org/P47650 and previous config saved to /var/cache/conftool/dbconfig/20230504-182139-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T335845)', diff saved to https://phabricator.wikimedia.org/P47649 and previous config saved to /var/cache/conftool/dbconfig/20230504-182114-ladsgroup.json
  • 18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T335838)', diff saved to https://phabricator.wikimedia.org/P47648 and previous config saved to /var/cache/conftool/dbconfig/20230504-181851-ladsgroup.json
  • 18:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47647 and previous config saved to /var/cache/conftool/dbconfig/20230504-181828-ladsgroup.json
  • 18:17 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47646 and previous config saved to /var/cache/conftool/dbconfig/20230504-181636-ladsgroup.json
  • 18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.7 refs T330213
  • 18:15 cmooney@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2011']
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47645 and previous config saved to /var/cache/conftool/dbconfig/20230504-181516-ladsgroup.json
  • 18:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 18:14 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
  • 18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T335845)', diff saved to https://phabricator.wikimedia.org/P47644 and previous config saved to /var/cache/conftool/dbconfig/20230504-181451-ladsgroup.json
  • 18:14 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2011']
  • 18:13 cmooney@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2011']
  • 18:12 fab@deploy1002: Finished deploy [airflow-dags/research@88ebdf7]: (no justification provided) (duration: 00m 28s)
  • 18:12 fab@deploy1002: Started deploy [airflow-dags/research@88ebdf7]: (no justification provided)
  • 18:12 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2011']
  • 18:11 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 18:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1049.eqiad.wmnet
  • 18:08 brennen: train 1.41.0-wmf.7 (T330213): logs fairly quiet and no current blockers, rolling to group2
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P47643 and previous config saved to /var/cache/conftool/dbconfig/20230504-180608-ladsgroup.json
  • 18:05 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 18:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2049.codfw.wmnet
  • 18:04 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 18:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1049.eqiad.wmnet
  • 18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P47642 and previous config saved to /var/cache/conftool/dbconfig/20230504-180322-ladsgroup.json
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P47641 and previous config saved to /var/cache/conftool/dbconfig/20230504-175945-ladsgroup.json
  • 17:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2049.codfw.wmnet
  • 17:54 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 17:53 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 17:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1048.eqiad.wmnet
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P47640 and previous config saved to /var/cache/conftool/dbconfig/20230504-175102-ladsgroup.json
  • 17:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
  • 17:48 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P47639 and previous config saved to /var/cache/conftool/dbconfig/20230504-174815-ladsgroup.json
  • 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P47638 and previous config saved to /var/cache/conftool/dbconfig/20230504-174438-ladsgroup.json
  • 17:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1048.eqiad.wmnet
  • 17:42 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
  • 17:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T335838)', diff saved to https://phabricator.wikimedia.org/P47637 and previous config saved to /var/cache/conftool/dbconfig/20230504-174040-ladsgroup.json
  • 17:37 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T335845)', diff saved to https://phabricator.wikimedia.org/P47635 and previous config saved to /var/cache/conftool/dbconfig/20230504-173555-ladsgroup.json
  • 17:35 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict1002.eqiad.wmnet
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47634 and previous config saved to /var/cache/conftool/dbconfig/20230504-173309-ladsgroup.json
  • 17:32 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 17:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2047.codfw.wmnet
  • 17:32 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 17:31 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host aphlict1002.eqiad.wmnet
  • 17:31 mutante: people1003 - rebooting
  • 17:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1047.eqiad.wmnet
  • 17:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on people1003.eqiad.wmnet with reason: maintenance upgrade
  • 17:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on people1003.eqiad.wmnet with reason: maintenance upgrade
  • 17:30 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict2001.codfw.wmnet
  • 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T335845)', diff saved to https://phabricator.wikimedia.org/P47633 and previous config saved to /var/cache/conftool/dbconfig/20230504-172932-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T335845)', diff saved to https://phabricator.wikimedia.org/P47632 and previous config saved to /var/cache/conftool/dbconfig/20230504-172835-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T335845)', diff saved to https://phabricator.wikimedia.org/P47631 and previous config saved to /var/cache/conftool/dbconfig/20230504-172806-ladsgroup.json
  • 17:26 eoghan@cumin1001: START - Cookbook sre.hosts.reboot-single for host aphlict2001.codfw.wmnet
  • 17:25 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47630 and previous config saved to /var/cache/conftool/dbconfig/20230504-172546-ladsgroup.json
  • 17:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P47629 and previous config saved to /var/cache/conftool/dbconfig/20230504-172534-ladsgroup.json
  • 17:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 17:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2047.codfw.wmnet
  • 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47628 and previous config saved to /var/cache/conftool/dbconfig/20230504-172523-ladsgroup.json
  • 17:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1047.eqiad.wmnet
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T335845)', diff saved to https://phabricator.wikimedia.org/P47627 and previous config saved to /var/cache/conftool/dbconfig/20230504-172228-ladsgroup.json
  • 17:22 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs10[11-21].eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 17:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 17:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T335845)', diff saved to https://phabricator.wikimedia.org/P47626 and previous config saved to /var/cache/conftool/dbconfig/20230504-172204-ladsgroup.json
  • 17:16 mutante: aphlict2001 - not active, rebooting
  • 17:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2046.codfw.wmnet
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P47625 and previous config saved to /var/cache/conftool/dbconfig/20230504-171300-ladsgroup.json
  • 17:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1046.eqiad.wmnet
  • 17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P47624 and previous config saved to /var/cache/conftool/dbconfig/20230504-171028-ladsgroup.json
  • 17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P47623 and previous config saved to /var/cache/conftool/dbconfig/20230504-171017-ladsgroup.json
  • 17:09 brennen: phab1004 deployed and restarted, phab up, MR widget still seems to work
  • 17:08 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
  • 17:08 brennen@deploy1002: Finished deploy [phabricator/deployment@0529926]: deploy latest state to phab1004 (duration: 00m 34s)
  • 17:07 brennen@deploy1002: Started deploy [phabricator/deployment@0529926]: deploy latest state to phab1004
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P47622 and previous config saved to /var/cache/conftool/dbconfig/20230504-170658-ladsgroup.json
  • 17:05 brennen@deploy1002: Finished deploy [phabricator/deployment@0529926]: deploy latest state to phab2002 (duration: 00m 37s)
  • 17:05 brennen@deploy1002: Started deploy [phabricator/deployment@0529926]: deploy latest state to phab2002
  • 17:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1046.eqiad.wmnet
  • 17:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: maintenance upgrade
  • 17:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: maintenance upgrade
  • 17:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: maintenance upgrade
  • 17:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: maintenance upgrade
  • 17:01 mutante: Phabricator upgrade - maintenance incoming
  • 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host lvs2011.codfw.wmnet
  • 16:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P47621 and previous config saved to /var/cache/conftool/dbconfig/20230504-165753-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T335838)', diff saved to https://phabricator.wikimedia.org/P47620 and previous config saved to /var/cache/conftool/dbconfig/20230504-165521-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P47619 and previous config saved to /var/cache/conftool/dbconfig/20230504-165511-ladsgroup.json
  • 16:52 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
  • 16:52 sbassett@deploy1002: Finished scap: Backport for Re-enable the Graph extension on test2wiki (T334940) (duration: 07m 04s)
  • 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P47618 and previous config saved to /var/cache/conftool/dbconfig/20230504-165152-ladsgroup.json
  • 16:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1045.eqiad.wmnet
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T335838)', diff saved to https://phabricator.wikimedia.org/P47617 and previous config saved to /var/cache/conftool/dbconfig/20230504-164850-ladsgroup.json
  • 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T335838)', diff saved to https://phabricator.wikimedia.org/P47616 and previous config saved to /var/cache/conftool/dbconfig/20230504-164826-ladsgroup.json
  • 16:46 sbassett@deploy1002: sbassett: Backport for Re-enable the Graph extension on test2wiki (T334940) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 16:46 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1045.eqiad.wmnet
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
  • 16:45 sbassett@deploy1002: Started scap: Backport for Re-enable the Graph extension on test2wiki (T334940)
  • 16:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
  • 16:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2044.codfw.wmnet
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T335845)', diff saved to https://phabricator.wikimedia.org/P47615 and previous config saved to /var/cache/conftool/dbconfig/20230504-164247-ladsgroup.json
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47614 and previous config saved to /var/cache/conftool/dbconfig/20230504-164004-ladsgroup.json
  • 16:39 jynus: extending logical volume of backup1003, backup2003 for backup storage
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T335845)', diff saved to https://phabricator.wikimedia.org/P47613 and previous config saved to /var/cache/conftool/dbconfig/20230504-163646-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T335845)', diff saved to https://phabricator.wikimedia.org/P47612 and previous config saved to /var/cache/conftool/dbconfig/20230504-163626-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 16:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47611 and previous config saved to /var/cache/conftool/dbconfig/20230504-163601-ladsgroup.json
  • 16:34 mutante: etherpad1003 (https://etherpad.wikimedia.org) rebooting, 1 min downtime
  • 16:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on etherpad1003.eqiad.wmnet with reason: reboot
  • 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on etherpad1003.eqiad.wmnet with reason: reboot
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P47610 and previous config saved to /var/cache/conftool/dbconfig/20230504-163319-ladsgroup.json
  • 16:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1044.eqiad.wmnet
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47609 and previous config saved to /var/cache/conftool/dbconfig/20230504-163149-ladsgroup.json
  • 16:30 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 152m 23s)
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T335845)', diff saved to https://phabricator.wikimedia.org/P47608 and previous config saved to /var/cache/conftool/dbconfig/20230504-162926-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T335845)', diff saved to https://phabricator.wikimedia.org/P47607 and previous config saved to /var/cache/conftool/dbconfig/20230504-162902-ladsgroup.json
  • 16:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gerrit1003.wikimedia.org with reason: reboot
  • 16:27 mutante: gerrit1003 (gerrit-new.wikimedia.org) - rebooting
  • 16:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gerrit1003.wikimedia.org with reason: reboot
  • 16:26 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1044.eqiad.wmnet
  • 16:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
  • 16:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2002.codfw.wmnet
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P47606 and previous config saved to /var/cache/conftool/dbconfig/20230504-162055-ladsgroup.json
  • 16:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P47605 and previous config saved to /var/cache/conftool/dbconfig/20230504-161813-ladsgroup.json
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P47604 and previous config saved to /var/cache/conftool/dbconfig/20230504-161643-ladsgroup.json
  • 16:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2002.codfw.wmnet
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P47603 and previous config saved to /var/cache/conftool/dbconfig/20230504-161356-ladsgroup.json
  • 16:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1043.eqiad.wmnet
  • 16:12 mutante: doc1003 - rebooting
  • 16:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
  • 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on doc1002.eqiad.wmnet with reason: reboot
  • 16:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on doc1002.eqiad.wmnet with reason: reboot
  • 16:10 mutante: doc1002 (https://doc.wikimedia.org) - reboot, <1 min downtime
  • 16:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
  • 16:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
  • 16:06 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1043.eqiad.wmnet
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P47602 and previous config saved to /var/cache/conftool/dbconfig/20230504-160547-ladsgroup.json
  • 16:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T335838)', diff saved to https://phabricator.wikimedia.org/P47601 and previous config saved to /var/cache/conftool/dbconfig/20230504-160307-ladsgroup.json
  • 16:02 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P47600 and previous config saved to /var/cache/conftool/dbconfig/20230504-160136-ladsgroup.json
  • 16:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
  • 15:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P47599 and previous config saved to /var/cache/conftool/dbconfig/20230504-155850-ladsgroup.json
  • 15:58 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs10[11-21].eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:57 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs20[02-12].codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T335838)', diff saved to https://phabricator.wikimedia.org/P47598 and previous config saved to /var/cache/conftool/dbconfig/20230504-155544-ladsgroup.json
  • 15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T335838)', diff saved to https://phabricator.wikimedia.org/P47597 and previous config saved to /var/cache/conftool/dbconfig/20230504-155518-ladsgroup.json
  • 15:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
  • 15:54 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
  • 15:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1042.eqiad.wmnet
  • 15:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47596 and previous config saved to /var/cache/conftool/dbconfig/20230504-155041-ladsgroup.json
  • 15:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
  • 15:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1042.eqiad.wmnet
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47595 and previous config saved to /var/cache/conftool/dbconfig/20230504-154630-ladsgroup.json
  • 15:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T335845)', diff saved to https://phabricator.wikimedia.org/P47594 and previous config saved to /var/cache/conftool/dbconfig/20230504-154344-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T335845)', diff saved to https://phabricator.wikimedia.org/P47593 and previous config saved to /var/cache/conftool/dbconfig/20230504-154211-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T335845)', diff saved to https://phabricator.wikimedia.org/P47592 and previous config saved to /var/cache/conftool/dbconfig/20230504-154146-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47591 and previous config saved to /var/cache/conftool/dbconfig/20230504-154021-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P47590 and previous config saved to /var/cache/conftool/dbconfig/20230504-154012-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47589 and previous config saved to /var/cache/conftool/dbconfig/20230504-153850-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1183 (T335845)', diff saved to https://phabricator.wikimedia.org/P47588 and previous config saved to /var/cache/conftool/dbconfig/20230504-153834-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47587 and previous config saved to /var/cache/conftool/dbconfig/20230504-153825-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47586 and previous config saved to /var/cache/conftool/dbconfig/20230504-153810-ladsgroup.json
  • 15:38 mutante: doc2002 - rebooting
  • 15:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
  • 15:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
  • 15:33 mutante: moscovium (https://rt.wikimedia.org) - rebooting
  • 15:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on moscovium.eqiad.wmnet with reason: reboot
  • 15:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on moscovium.eqiad.wmnet with reason: reboot
  • 15:27 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P47585 and previous config saved to /var/cache/conftool/dbconfig/20230504-152640-ladsgroup.json
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P47584 and previous config saved to /var/cache/conftool/dbconfig/20230504-152506-ladsgroup.json
  • 15:24 marostegui: Failover m1-master from dbproxy1012 to dbproxy1014
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P47583 and previous config saved to /var/cache/conftool/dbconfig/20230504-152319-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P47582 and previous config saved to /var/cache/conftool/dbconfig/20230504-152304-ladsgroup.json
  • 15:21 mutante: adding new project langauge 'gpe' - https://en.wikipedia.org/wiki/Ghanaian_Pidgin_English
  • 15:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P47581 and previous config saved to /var/cache/conftool/dbconfig/20230504-151133-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T335838)', diff saved to https://phabricator.wikimedia.org/P47580 and previous config saved to /var/cache/conftool/dbconfig/20230504-151000-ladsgroup.json
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P47579 and previous config saved to /var/cache/conftool/dbconfig/20230504-150813-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P47578 and previous config saved to /var/cache/conftool/dbconfig/20230504-150758-ladsgroup.json
  • 15:07 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
  • 15:03 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host lvs2011.codfw.wmnet
  • 15:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host lvs2011.codfw.wmnet
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T335838)', diff saved to https://phabricator.wikimedia.org/P47576 and previous config saved to /var/cache/conftool/dbconfig/20230504-150336-ladsgroup.json
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 15:03 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host lvs2011.codfw.wmnet
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47575 and previous config saved to /var/cache/conftool/dbconfig/20230504-150307-ladsgroup.json
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T335845)', diff saved to https://phabricator.wikimedia.org/P47574 and previous config saved to /var/cache/conftool/dbconfig/20230504-145627-ladsgroup.json
  • 14:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1039.eqiad.wmnet
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47573 and previous config saved to /var/cache/conftool/dbconfig/20230504-145307-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47572 and previous config saved to /var/cache/conftool/dbconfig/20230504-145251-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47571 and previous config saved to /var/cache/conftool/dbconfig/20230504-145153-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T335845)', diff saved to https://phabricator.wikimedia.org/P47570 and previous config saved to /var/cache/conftool/dbconfig/20230504-144852-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T335845)', diff saved to https://phabricator.wikimedia.org/P47569 and previous config saved to /var/cache/conftool/dbconfig/20230504-144827-ladsgroup.json
  • 14:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P47568 and previous config saved to /var/cache/conftool/dbconfig/20230504-144801-ladsgroup.json
  • 14:47 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2011.codfw.wmnet with OS bullseye
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T335845)', diff saved to https://phabricator.wikimedia.org/P47567 and previous config saved to /var/cache/conftool/dbconfig/20230504-144625-ladsgroup.json
  • 14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47566 and previous config saved to /var/cache/conftool/dbconfig/20230504-144110-ladsgroup.json
  • 14:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 14:40 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2011.codfw.wmnet with OS bullseye
  • 14:38 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
  • 14:36 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[02-12].codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P47565 and previous config saved to /var/cache/conftool/dbconfig/20230504-143647-ladsgroup.json
  • 14:35 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 14:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
  • 14:34 eevans@cumin1001: END (ERROR) - Cookbook sre.cassandra.roll-restart (exit_code=97) for nodes matching aqs20[02-12].codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P47564 and previous config saved to /var/cache/conftool/dbconfig/20230504-143320-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P47563 and previous config saved to /var/cache/conftool/dbconfig/20230504-143255-ladsgroup.json
  • 14:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
  • 14:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P47562 and previous config saved to /var/cache/conftool/dbconfig/20230504-142604-ladsgroup.json
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P47561 and previous config saved to /var/cache/conftool/dbconfig/20230504-142140-ladsgroup.json
  • 14:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P47560 and previous config saved to /var/cache/conftool/dbconfig/20230504-141814-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47559 and previous config saved to /var/cache/conftool/dbconfig/20230504-141749-ladsgroup.json
  • 14:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd1004.eqiad.wmnet
  • 14:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1037.eqiad.wmnet
  • 14:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd1004.eqiad.wmnet
  • 14:12 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd1005.eqiad.wmnet
  • 14:12 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[02-12].codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 14:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P47558 and previous config saved to /var/cache/conftool/dbconfig/20230504-141057-ladsgroup.json
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T335838)', diff saved to https://phabricator.wikimedia.org/P47557 and previous config saved to /var/cache/conftool/dbconfig/20230504-141024-ladsgroup.json
  • 14:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 14:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47556 and previous config saved to /var/cache/conftool/dbconfig/20230504-140958-ladsgroup.json
  • 14:09 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1037.eqiad.wmnet
  • 14:08 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd1005.eqiad.wmnet
  • 14:07 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd1006.eqiad.wmnet
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47555 and previous config saved to /var/cache/conftool/dbconfig/20230504-140634-ladsgroup.json
  • 14:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
  • 14:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd1006.eqiad.wmnet
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T335845)', diff saved to https://phabricator.wikimedia.org/P47554 and previous config saved to /var/cache/conftool/dbconfig/20230504-140308-ladsgroup.json
  • 14:01 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2004.codfw.wmnet
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47553 and previous config saved to /var/cache/conftool/dbconfig/20230504-140012-ladsgroup.json
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47552 and previous config saved to /var/cache/conftool/dbconfig/20230504-135845-ladsgroup.json
  • 13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T335838)', diff saved to https://phabricator.wikimedia.org/P47551 and previous config saved to /var/cache/conftool/dbconfig/20230504-135821-ladsgroup.json
  • 13:58 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
  • 13:57 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2004.codfw.wmnet
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T335845)', diff saved to https://phabricator.wikimedia.org/P47550 and previous config saved to /var/cache/conftool/dbconfig/20230504-135637-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T335845)', diff saved to https://phabricator.wikimedia.org/P47549 and previous config saved to /var/cache/conftool/dbconfig/20230504-135612-ladsgroup.json
  • 13:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2005.codfw.wmnet
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47548 and previous config saved to /var/cache/conftool/dbconfig/20230504-135551-ladsgroup.json
  • 13:54 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:54 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Make wbsubscribers API output sensible on Test Wikidata (T300458) (duration: 09m 52s)
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P47547 and previous config saved to /var/cache/conftool/dbconfig/20230504-135452-ladsgroup.json
  • 13:53 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2005.codfw.wmnet
  • 13:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2006.codfw.wmnet
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47546 and previous config saved to /var/cache/conftool/dbconfig/20230504-135135-ladsgroup.json
  • 13:48 herron: switching to bullseye kafka monitoring hosts T335424
  • 13:48 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2006.codfw.wmnet
  • 13:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd1004.eqiad.wmnet
  • 13:47 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Make wbsubscribers API output sensible on Test Wikidata (T300458) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:45 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Make wbsubscribers API output sensible on Test Wikidata (T300458)
  • 13:43 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd1004.eqiad.wmnet
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P47545 and previous config saved to /var/cache/conftool/dbconfig/20230504-134315-ladsgroup.json
  • 13:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd1005.eqiad.wmnet
  • 13:41 jdrewniak@deploy1002: Finished scap: Backport for Enable Vector 2022 as the default skin on frwikinews (T335686) (duration: 07m 47s)
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P47544 and previous config saved to /var/cache/conftool/dbconfig/20230504-134106-ladsgroup.json
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P47543 and previous config saved to /var/cache/conftool/dbconfig/20230504-133945-ladsgroup.json
  • 13:39 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd1005.eqiad.wmnet
  • 13:38 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:37 elukey: revert "Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in kafka logging clusters - T334733"
  • 13:37 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd1006.eqiad.wmnet
  • 13:37 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P47542 and previous config saved to /var/cache/conftool/dbconfig/20230504-133628-ladsgroup.json
  • 13:35 jdrewniak@deploy1002: jdrewniak: Backport for Enable Vector 2022 as the default skin on frwikinews (T335686) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:34 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd1006.eqiad.wmnet
  • 13:33 jdrewniak@deploy1002: Started scap: Backport for Enable Vector 2022 as the default skin on frwikinews (T335686)
  • 13:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2001.codfw.wmnet
  • 13:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2001.codfw.wmnet
  • 13:30 jdrewniak@deploy1002: Finished scap: Backport for Enable Vector 2022 as the default skin on eswiki (T335686) (duration: 08m 01s)
  • 13:30 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2002.codfw.wmnet
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P47541 and previous config saved to /var/cache/conftool/dbconfig/20230504-132809-ladsgroup.json
  • 13:27 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2002.codfw.wmnet
  • 13:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2003.codfw.wmnet
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P47540 and previous config saved to /var/cache/conftool/dbconfig/20230504-132600-ladsgroup.json
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47539 and previous config saved to /var/cache/conftool/dbconfig/20230504-132439-ladsgroup.json
  • 13:23 jdrewniak@deploy1002: jdrewniak: Backport for Enable Vector 2022 as the default skin on eswiki (T335686) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:23 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:22 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:22 jdrewniak@deploy1002: Started scap: Backport for Enable Vector 2022 as the default skin on eswiki (T335686)
  • 13:22 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2003.codfw.wmnet
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P47538 and previous config saved to /var/cache/conftool/dbconfig/20230504-132122-ladsgroup.json
  • 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47537 and previous config saved to /var/cache/conftool/dbconfig/20230504-131621-ladsgroup.json
  • 13:15 jdrewniak@deploy1002: Finished scap: Backport for [10%] Enable Vector 2022 as the default skin for eswiki (T335686) (duration: 08m 15s)
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T335838)', diff saved to https://phabricator.wikimedia.org/P47536 and previous config saved to /var/cache/conftool/dbconfig/20230504-131302-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T335845)', diff saved to https://phabricator.wikimedia.org/P47535 and previous config saved to /var/cache/conftool/dbconfig/20230504-131054-ladsgroup.json
  • 13:09 jdrewniak@deploy1002: jdrewniak: Backport for [10%] Enable Vector 2022 as the default skin for eswiki (T335686) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:06 jdrewniak@deploy1002: Started scap: Backport for [10%] Enable Vector 2022 as the default skin for eswiki (T335686)
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47534 and previous config saved to /var/cache/conftool/dbconfig/20230504-130616-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T335845)', diff saved to https://phabricator.wikimedia.org/P47533 and previous config saved to /var/cache/conftool/dbconfig/20230504-130432-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P47532 and previous config saved to /var/cache/conftool/dbconfig/20230504-130115-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1004.wikimedia.org
  • 12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T335845)', diff saved to https://phabricator.wikimedia.org/P47531 and previous config saved to /var/cache/conftool/dbconfig/20230504-125309-ladsgroup.json
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T335845)', diff saved to https://phabricator.wikimedia.org/P47530 and previous config saved to /var/cache/conftool/dbconfig/20230504-125250-ladsgroup.json
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1004.wikimedia.org
  • 12:48 moritzm: installing ruby-rack security updates
  • 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P47529 and previous config saved to /var/cache/conftool/dbconfig/20230504-124609-ladsgroup.json
  • 12:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - ayounsi@cumin1001
  • 12:38 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - ayounsi@cumin1001
  • 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47528 and previous config saved to /var/cache/conftool/dbconfig/20230504-123103-ladsgroup.json
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47527 and previous config saved to /var/cache/conftool/dbconfig/20230504-122237-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47526 and previous config saved to /var/cache/conftool/dbconfig/20230504-122114-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47525 and previous config saved to /var/cache/conftool/dbconfig/20230504-122048-ladsgroup.json
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T335838)', diff saved to https://phabricator.wikimedia.org/P47524 and previous config saved to /var/cache/conftool/dbconfig/20230504-121247-ladsgroup.json
  • 12:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 12:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T335838)', diff saved to https://phabricator.wikimedia.org/P47523 and previous config saved to /var/cache/conftool/dbconfig/20230504-121224-ladsgroup.json
  • 12:10 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix output path of list=wbsubscribers API (T300458) (duration: 07m 43s)
  • 12:08 moritzm: installing libdatetime-timezone-perl updates
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P47522 and previous config saved to /var/cache/conftool/dbconfig/20230504-120542-ladsgroup.json
  • 12:04 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Fix output path of list=wbsubscribers API (T300458) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 12:03 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix output path of list=wbsubscribers API (T300458)
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P47521 and previous config saved to /var/cache/conftool/dbconfig/20230504-115717-ladsgroup.json
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P47520 and previous config saved to /var/cache/conftool/dbconfig/20230504-115035-ladsgroup.json
  • 11:44 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix output path of list=wbsubscribers API (T300458) (duration: 08m 24s)
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P47519 and previous config saved to /var/cache/conftool/dbconfig/20230504-114211-ladsgroup.json
  • 11:38 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for Fix output path of list=wbsubscribers API (T300458) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 11:38 kart_: Updated cxserver to 2023-05-03-044244-production (T333835, T335019, T331505)
  • 11:36 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix output path of list=wbsubscribers API (T300458)
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47518 and previous config saved to /var/cache/conftool/dbconfig/20230504-113529-ladsgroup.json
  • 11:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:33 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:31 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:31 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:30 moritzm: installing curl security updates (on buster)
  • 11:30 moritzm: installing curl security updates
  • 11:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T335838)', diff saved to https://phabricator.wikimedia.org/P47516 and previous config saved to /var/cache/conftool/dbconfig/20230504-112705-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47515 and previous config saved to /var/cache/conftool/dbconfig/20230504-112650-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T335838)', diff saved to https://phabricator.wikimedia.org/P47514 and previous config saved to /var/cache/conftool/dbconfig/20230504-112625-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T335838)', diff saved to https://phabricator.wikimedia.org/P47513 and previous config saved to /var/cache/conftool/dbconfig/20230504-112041-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T335838)', diff saved to https://phabricator.wikimedia.org/P47512 and previous config saved to /var/cache/conftool/dbconfig/20230504-112017-ladsgroup.json
  • 11:15 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:14 eoghan@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host aphlict2001.codfw.wmnet with OS bullseye
  • 11:13 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P47511 and previous config saved to /var/cache/conftool/dbconfig/20230504-111119-ladsgroup.json
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P47510 and previous config saved to /var/cache/conftool/dbconfig/20230504-110511-ladsgroup.json
  • 11:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5713
  • 11:04 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
  • 11:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 5713
  • 11:01 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aphlict2001.codfw.wmnet with reason: host reimage
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P47509 and previous config saved to /var/cache/conftool/dbconfig/20230504-105613-ladsgroup.json
  • 10:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
  • 10:54 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
  • 10:53 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2004.wikimedia.org
  • 10:51 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P47508 and previous config saved to /var/cache/conftool/dbconfig/20230504-105005-ladsgroup.json
  • 10:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
  • 10:48 eoghan@cumin1001: START - Cookbook sre.ganeti.reimage for host aphlict2001.codfw.wmnet with OS bullseye
  • 10:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2004.wikimedia.org
  • 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3002.wikimedia.org
  • 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T335838)', diff saved to https://phabricator.wikimedia.org/P47507 and previous config saved to /var/cache/conftool/dbconfig/20230504-104107-ladsgroup.json
  • 10:40 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
  • 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3002.wikimedia.org
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4002.wikimedia.org
  • 10:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
  • 10:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4002.wikimedia.org
  • 10:35 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T335838)', diff saved to https://phabricator.wikimedia.org/P47506 and previous config saved to /var/cache/conftool/dbconfig/20230504-103459-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T335838)', diff saved to https://phabricator.wikimedia.org/P47505 and previous config saved to /var/cache/conftool/dbconfig/20230504-103434-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T335838)', diff saved to https://phabricator.wikimedia.org/P47504 and previous config saved to /var/cache/conftool/dbconfig/20230504-103409-ladsgroup.json
  • 10:30 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:28 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T335838)', diff saved to https://phabricator.wikimedia.org/P47503 and previous config saved to /var/cache/conftool/dbconfig/20230504-102835-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 10:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T335838)', diff saved to https://phabricator.wikimedia.org/P47502 and previous config saved to /var/cache/conftool/dbconfig/20230504-102812-ladsgroup.json
  • 10:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
  • 10:23 Amir1: Removing db1114 from zarcillo T335837
  • 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1114.eqiad.wmnet
  • 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1114.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1114.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P47501 and previous config saved to /var/cache/conftool/dbconfig/20230504-101903-ladsgroup.json
  • 10:17 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 10:17 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
  • 10:16 elukey@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:16 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 10:16 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 10:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P47500 and previous config saved to /var/cache/conftool/dbconfig/20230504-101306-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1114.eqiad.wmnet
  • 10:10 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 10:05 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P47499 and previous config saved to /var/cache/conftool/dbconfig/20230504-100357-ladsgroup.json
  • 10:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P47498 and previous config saved to /var/cache/conftool/dbconfig/20230504-095800-ladsgroup.json
  • 09:52 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Remove db1114 from dbctl T335837', diff saved to https://phabricator.wikimedia.org/P47497 and previous config saved to /var/cache/conftool/dbconfig/20230504-094945-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T335838)', diff saved to https://phabricator.wikimedia.org/P47496 and previous config saved to /var/cache/conftool/dbconfig/20230504-094850-ladsgroup.json
  • 09:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor200[3456].codfw.wmnet
  • 09:47 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T335838)', diff saved to https://phabricator.wikimedia.org/P47495 and previous config saved to /var/cache/conftool/dbconfig/20230504-094253-ladsgroup.json
  • 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T335838)', diff saved to https://phabricator.wikimedia.org/P47494 and previous config saved to /var/cache/conftool/dbconfig/20230504-094221-ladsgroup.json
  • 09:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 09:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T335838)', diff saved to https://phabricator.wikimedia.org/P47493 and previous config saved to /var/cache/conftool/dbconfig/20230504-094156-ladsgroup.json
  • 09:38 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 09:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 09:38 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cloudbackup1001-dev.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
  • 09:37 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for cloudbackup1001-dev.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T335838)', diff saved to https://phabricator.wikimedia.org/P47492 and previous config saved to /var/cache/conftool/dbconfig/20230504-093733-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T335838)', diff saved to https://phabricator.wikimedia.org/P47491 and previous config saved to /var/cache/conftool/dbconfig/20230504-093710-ladsgroup.json
  • 09:36 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:36 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1114 T335837', diff saved to https://phabricator.wikimedia.org/P47490 and previous config saved to /var/cache/conftool/dbconfig/20230504-093419-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P47488 and previous config saved to /var/cache/conftool/dbconfig/20230504-092649-ladsgroup.json
  • 09:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P47487 and previous config saved to /var/cache/conftool/dbconfig/20230504-092203-ladsgroup.json
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
  • 09:12 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P47486 and previous config saved to /var/cache/conftool/dbconfig/20230504-091143-ladsgroup.json
  • 09:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P47485 and previous config saved to /var/cache/conftool/dbconfig/20230504-090657-ladsgroup.json
  • 09:06 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 09:04 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T335838)', diff saved to https://phabricator.wikimedia.org/P47484 and previous config saved to /var/cache/conftool/dbconfig/20230504-085637-ladsgroup.json
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T335838)', diff saved to https://phabricator.wikimedia.org/P47483 and previous config saved to /var/cache/conftool/dbconfig/20230504-085151-ladsgroup.json
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T335838)', diff saved to https://phabricator.wikimedia.org/P47482 and previous config saved to /var/cache/conftool/dbconfig/20230504-085008-ladsgroup.json
  • 08:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 08:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T335838)', diff saved to https://phabricator.wikimedia.org/P47481 and previous config saved to /var/cache/conftool/dbconfig/20230504-084741-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 08:40 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 08:40 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 08:37 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes1006.eqiad.wmnet
  • 08:37 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes1006.eqiad.wmnet
  • 08:14 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 08:07 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:07 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:04 urbanecm@deploy1002: Finished scap: Backport for [Growth] Deploy Personalized praise to AR, BN, CS (T334630) (duration: 07m 24s)
  • 07:58 urbanecm@deploy1002: urbanecm: Backport for [Growth] Deploy Personalized praise to AR, BN, CS (T334630) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:57 urbanecm@deploy1002: Started scap: Backport for [Growth] Deploy Personalized praise to AR, BN, CS (T334630)
  • 07:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1132.eqiad.wmnet with reason: Onsite maintenance T334722
  • 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1132.eqiad.wmnet with reason: Onsite maintenance T334722
  • 07:56 urbanecm@deploy1002: Finished scap: Backport for ApiVisualEditor: Support preloading from i18n messages (T330337), Mentor dashboard: Move away from alpha/beta/stable (T334630) (duration: 07m 08s)
  • 07:50 urbanecm@deploy1002: urbanecm: Backport for ApiVisualEditor: Support preloading from i18n messages (T330337), Mentor dashboard: Move away from alpha/beta/stable (T334630) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:49 urbanecm@deploy1002: Started scap: Backport for ApiVisualEditor: Support preloading from i18n messages (T330337), Mentor dashboard: Move away from alpha/beta/stable (T334630)
  • 07:37 urbanecm@deploy1002: Finished scap: Backport for Mentor dashboard: Move away from alpha/beta/stable (T334630), EditPage: Support preloading from i18n messages (T330337), ApiVisualEditor: Support preloading from i18n messages (T330337), EditPage: Support preloading from i18n messages (T330337) (duration: 07m 58s)
  • 07:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 134823
  • 07:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 134823
  • 07:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 293
  • 07:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 293
  • 07:31 urbanecm@deploy1002: urbanecm: Backport for Mentor dashboard: Move away from alpha/beta/stable (T334630), EditPage: Support preloading from i18n messages (T330337), ApiVisualEditor: Support preloading from i18n messages (T330337), EditPage: Support preloading from i18n messages (T330337) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2
  • 07:30 urbanecm@deploy1002: Started scap: Backport for Mentor dashboard: Move away from alpha/beta/stable (T334630), EditPage: Support preloading from i18n messages (T330337), ApiVisualEditor: Support preloading from i18n messages (T330337), EditPage: Support preloading from i18n messages (T330337)
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5002.wikimedia.org
  • 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5002.wikimedia.org
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast2003.wikimedia.org with OS bookworm
  • 07:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6002.wikimedia.org
  • 07:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6002.wikimedia.org
  • 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on pc2011.codfw.wmnet with reason: Onsite maintenance T334722
  • 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1002.wikimedia.org
  • 06:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on pc2011.codfw.wmnet with reason: Onsite maintenance T334722
  • 06:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
  • 06:57 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc1 master (T334722) (duration: 07m 23s)
  • 06:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
  • 06:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp1002.wikimedia.org
  • 06:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
  • 06:52 marostegui: Promote pc2014 as pc1 master codfw dbmaint - T334722
  • 06:51 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Promote pc2014 to pc1 master (T334722) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 06:49 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc1 master (T334722)
  • 06:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
  • 06:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2002.wikimedia.org
  • 06:46 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:44 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2002.wikimedia.org
  • 06:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast2003.wikimedia.org with OS bookworm
  • 06:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4004.wikimedia.org
  • 06:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 06:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 06:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 06:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4004.wikimedia.org
  • 06:18 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 06:10 slyngshede@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts test-reimage2001.codfw.wmnet
  • 06:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test-reimage2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1001"
  • 06:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 06:07 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test-reimage2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1001"
  • 06:05 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 06:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 06:01 slyngshede@cumin1001: START - Cookbook sre.hosts.decommission for hosts test-reimage2001.codfw.wmnet
  • 05:59 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host bast5003.wikimedia.org
  • 05:54 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5003.wikimedia.org
  • 05:51 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 04:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T335835
  • 04:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 04:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 04:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 6 hosts with reason: Rolling reboot for T335835
  • 04:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 6 hosts with reason: Rolling reboot for T335835
  • 04:45 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster reboot - ryankemper@cumin1001 - T335835
  • 04:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on relforge[1003-1004].eqiad.wmnet with reason: Rolling reboot T335835
  • 04:38 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on relforge[1003-1004].eqiad.wmnet with reason: Rolling reboot T335835
  • 04:38 ryankemper: [Elastic] Reboot operation failed w/ (likely transient) read timeouts, will try again in 10 mins
  • 04:37 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 04:36 ryankemper: [Elastic] Beginning rolling reboot of eqiad elastic, 3 nodes at a time, `ryankemper@cumin1001` tmux session `reboot_eqiad`
  • 04:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T335835
  • 04:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 50 hosts with reason: Rolling reboot of eqiad for T335835
  • 04:29 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 50 hosts with reason: Rolling reboot of eqiad for T335835
  • 02:42 eileen: config revision changed from 5ac52d82 to 7ac11236 reduce batch size, avoid failmail
  • 02:35 eileen: config revision changed from 121a864a to 5ac52d82
  • 02:33 eileen: civicrm upgraded from b97aaa08 to 05523a9d
  • 01:29 eileen: config revision changed from 26147e89 to 121a864a - disabling populate as it keeps rolling back so prob another overlong row

2023-05-03

  • 23:55 eileen: config revision changed from 2995f558 to 26147e89
  • 23:15 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T335835
  • 23:10 tzatziki: removing 1 file for legal compliance
  • 23:01 eileen: config revision changed from 69f60bb9 to 2995f558
  • 22:42 zabe@deploy1002: Finished scap: Backport for Start writing to af_actor/afh_actor in group1 wikis (T334295) (duration: 07m 13s)
  • 22:37 zabe@deploy1002: zabe: Backport for Start writing to af_actor/afh_actor in group1 wikis (T334295) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:35 zabe@deploy1002: Started scap: Backport for Start writing to af_actor/afh_actor in group1 wikis (T334295)
  • 22:34 tzatziki: removing 12 files for legal compliance
  • 22:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 22:11 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 22:08 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs2001.codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 22:04 eileen: civicrm upgraded from c6149ad2 to b97aaa08
  • 22:00 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs2001.codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 21:55 brett: Disable puppet on lvs4008 for new pybal deployment (just in case immediate config rollback is required) - T263797
  • 21:43 milimetric@deploy1002: Finished deploy [analytics/refinery@c53c095] (thin): Deploy THIN [analytics/refinery@c53c095] (duration: 00m 06s)
  • 21:43 milimetric@deploy1002: Started deploy [analytics/refinery@c53c095] (thin): Deploy THIN [analytics/refinery@c53c095]
  • 21:31 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[17-33].eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 21:31 brett: Uploaded pybal_1.15.11 to apt1001 via reprepro
  • 21:31 milimetric@deploy1002: Finished deploy [analytics/refinery@c53c095]: Refinery deploy [analytics/refinery@c53c095] (duration: 08m 22s)
  • 21:22 milimetric@deploy1002: Started deploy [analytics/refinery@c53c095]: Refinery deploy [analytics/refinery@c53c095]
  • 21:11 brett: Upgrading pybal to 1.15.11 on lvs4010
  • 20:54 cjming: end of UTC late backport window
  • 20:53 cjming@deploy1002: Finished scap: Backport for Router handling code should be centralized into mmv.bootstrap (T236591) (duration: 10m 08s)
  • 20:44 cjming@deploy1002: cjming and jdlrobson: Backport for Router handling code should be centralized into mmv.bootstrap (T236591) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:43 cjming@deploy1002: Started scap: Backport for Router handling code should be centralized into mmv.bootstrap (T236591)
  • 20:42 cjming@deploy1002: Finished scap: Backport for Explicitly enable MFCustomSiteModules (T270603) (duration: 10m 23s)
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T335838)', diff saved to https://phabricator.wikimedia.org/P47480 and previous config saved to /var/cache/conftool/dbconfig/20230503-203424-ladsgroup.json
  • 20:33 cjming@deploy1002: jdlrobson and cjming: Backport for Explicitly enable MFCustomSiteModules (T270603) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:32 cjming@deploy1002: Started scap: Backport for Explicitly enable MFCustomSiteModules (T270603)
  • 20:30 cjming@deploy1002: Finished scap: Backport for Enable graphs on test wikipedia and mediawiki.org (T334940) (duration: 08m 19s)
  • 20:23 cjming@deploy1002: cjming and jdlrobson: Backport for Enable graphs on test wikipedia and mediawiki.org (T334940) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:21 cjming@deploy1002: Started scap: Backport for Enable graphs on test wikipedia and mediawiki.org (T334940)
  • 20:19 cjming@deploy1002: Finished scap: Backport for Create autopatroller and patroller groups on bn.wikiquote (T335829) (duration: 08m 36s)
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P47479 and previous config saved to /var/cache/conftool/dbconfig/20230503-201918-ladsgroup.json
  • 20:14 fab@deploy1002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 19s)
  • 20:13 fab@deploy1002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
  • 20:13 cjming@deploy1002: cjming and mdsshakil: Backport for Create autopatroller and patroller groups on bn.wikiquote (T335829) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:11 cjming@deploy1002: Started scap: Backport for Create autopatroller and patroller groups on bn.wikiquote (T335829)
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P47478 and previous config saved to /var/cache/conftool/dbconfig/20230503-200411-ladsgroup.json
  • 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T335838)', diff saved to https://phabricator.wikimedia.org/P47477 and previous config saved to /var/cache/conftool/dbconfig/20230503-194905-ladsgroup.json
  • 19:43 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001 - T335835
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T335838)', diff saved to https://phabricator.wikimedia.org/P47476 and previous config saved to /var/cache/conftool/dbconfig/20230503-194238-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T335838)', diff saved to https://phabricator.wikimedia.org/P47475 and previous config saved to /var/cache/conftool/dbconfig/20230503-194213-ladsgroup.json
  • 19:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T335838)', diff saved to https://phabricator.wikimedia.org/P47474 and previous config saved to /var/cache/conftool/dbconfig/20230503-194045-ladsgroup.json
  • 19:37 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P47473 and previous config saved to /var/cache/conftool/dbconfig/20230503-192707-ladsgroup.json
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P47472 and previous config saved to /var/cache/conftool/dbconfig/20230503-192538-ladsgroup.json
  • 19:20 inflatador: bking@cumin1001 reboot Elastic cluster for T335835
  • 19:19 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reboot - bking@cumin1001
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P47471 and previous config saved to /var/cache/conftool/dbconfig/20230503-191200-ladsgroup.json
  • 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P47470 and previous config saved to /var/cache/conftool/dbconfig/20230503-191032-ladsgroup.json
  • 19:10 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T335838)', diff saved to https://phabricator.wikimedia.org/P47469 and previous config saved to /var/cache/conftool/dbconfig/20230503-185654-ladsgroup.json
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T335838)', diff saved to https://phabricator.wikimedia.org/P47468 and previous config saved to /var/cache/conftool/dbconfig/20230503-185526-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T335838)', diff saved to https://phabricator.wikimedia.org/P47467 and previous config saved to /var/cache/conftool/dbconfig/20230503-185026-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47466 and previous config saved to /var/cache/conftool/dbconfig/20230503-184957-ladsgroup.json
  • 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T335838)', diff saved to https://phabricator.wikimedia.org/P47465 and previous config saved to /var/cache/conftool/dbconfig/20230503-184610-ladsgroup.json
  • 18:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T335838)', diff saved to https://phabricator.wikimedia.org/P47464 and previous config saved to /var/cache/conftool/dbconfig/20230503-184536-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P47463 and previous config saved to /var/cache/conftool/dbconfig/20230503-183451-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P47462 and previous config saved to /var/cache/conftool/dbconfig/20230503-183030-ladsgroup.json
  • 18:26 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[17-33].eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 18:26 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[13-27].codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 18:22 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.7 refs T330213 (duration: 06m 18s)
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P47461 and previous config saved to /var/cache/conftool/dbconfig/20230503-181944-ladsgroup.json
  • 18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.7 refs T330213
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P47460 and previous config saved to /var/cache/conftool/dbconfig/20230503-181524-ladsgroup.json
  • 18:08 brennen: train 1.41.0-wmf.7 (T330213): logs quiet and no current blockers, rolling to group1
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47459 and previous config saved to /var/cache/conftool/dbconfig/20230503-180438-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T335838)', diff saved to https://phabricator.wikimedia.org/P47458 and previous config saved to /var/cache/conftool/dbconfig/20230503-180018-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T335838)', diff saved to https://phabricator.wikimedia.org/P47457 and previous config saved to /var/cache/conftool/dbconfig/20230503-175806-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T335838)', diff saved to https://phabricator.wikimedia.org/P47456 and previous config saved to /var/cache/conftool/dbconfig/20230503-175404-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T335838)', diff saved to https://phabricator.wikimedia.org/P47455 and previous config saved to /var/cache/conftool/dbconfig/20230503-175340-ladsgroup.json
  • 17:53 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:52 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47454 and previous config saved to /var/cache/conftool/dbconfig/20230503-175126-ladsgroup.json
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T335838)', diff saved to https://phabricator.wikimedia.org/P47453 and previous config saved to /var/cache/conftool/dbconfig/20230503-174330-ladsgroup.json
  • 17:41 inflatador: bking@cumin1001 reboot wdqs20[13-22].codfw.wmnet T335835
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P47452 and previous config saved to /var/cache/conftool/dbconfig/20230503-173834-ladsgroup.json
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P47451 and previous config saved to /var/cache/conftool/dbconfig/20230503-173620-ladsgroup.json
  • 17:32 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS bullseye
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P47450 and previous config saved to /var/cache/conftool/dbconfig/20230503-172824-ladsgroup.json
  • 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P47449 and previous config saved to /var/cache/conftool/dbconfig/20230503-172328-ladsgroup.json
  • 17:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2011.codfw.wmnet with OS bullseye
  • 17:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P47448 and previous config saved to /var/cache/conftool/dbconfig/20230503-172114-ladsgroup.json
  • 17:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:18 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage
  • 17:15 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P47447 and previous config saved to /var/cache/conftool/dbconfig/20230503-171317-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T335838)', diff saved to https://phabricator.wikimedia.org/P47446 and previous config saved to /var/cache/conftool/dbconfig/20230503-170821-ladsgroup.json
  • 17:07 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:07 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47445 and previous config saved to /var/cache/conftool/dbconfig/20230503-170607-ladsgroup.json
  • 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 17:05 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 169m 01s)
  • 17:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2011.codfw.wmnet with reason: host reimage
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T335838)', diff saved to https://phabricator.wikimedia.org/P47444 and previous config saved to /var/cache/conftool/dbconfig/20230503-165954-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T335838)', diff saved to https://phabricator.wikimedia.org/P47443 and previous config saved to /var/cache/conftool/dbconfig/20230503-165920-ladsgroup.json
  • 16:58 herron@cumin1001: START - Cookbook sre.ganeti.reimage for host kafkamon2003.codfw.wmnet with OS bullseye
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47442 and previous config saved to /var/cache/conftool/dbconfig/20230503-165818-ladsgroup.json
  • 16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T335838)', diff saved to https://phabricator.wikimedia.org/P47441 and previous config saved to /var/cache/conftool/dbconfig/20230503-165811-ladsgroup.json
  • 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T335838)', diff saved to https://phabricator.wikimedia.org/P47440 and previous config saved to /var/cache/conftool/dbconfig/20230503-165754-ladsgroup.json
  • 16:47 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:47 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:46 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T335838)', diff saved to https://phabricator.wikimedia.org/P47438 and previous config saved to /var/cache/conftool/dbconfig/20230503-164622-ladsgroup.json
  • 16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47437 and previous config saved to /var/cache/conftool/dbconfig/20230503-164557-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P47436 and previous config saved to /var/cache/conftool/dbconfig/20230503-164414-ladsgroup.json
  • 16:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2011.codfw.wmnet with OS bullseye
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P47435 and previous config saved to /var/cache/conftool/dbconfig/20230503-164248-ladsgroup.json
  • 16:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2011']
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P47434 and previous config saved to /var/cache/conftool/dbconfig/20230503-163051-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P47433 and previous config saved to /var/cache/conftool/dbconfig/20230503-162908-ladsgroup.json
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P47432 and previous config saved to /var/cache/conftool/dbconfig/20230503-162741-ladsgroup.json
  • 16:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2011']
  • 16:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2011']
  • 16:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2011']
  • 16:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2011']
  • 16:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2011']
  • 16:19 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2011']
  • 16:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2011']
  • 16:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P47431 and previous config saved to /var/cache/conftool/dbconfig/20230503-161545-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T335838)', diff saved to https://phabricator.wikimedia.org/P47430 and previous config saved to /var/cache/conftool/dbconfig/20230503-161402-ladsgroup.json
  • 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2011.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T335838)', diff saved to https://phabricator.wikimedia.org/P47429 and previous config saved to /var/cache/conftool/dbconfig/20230503-161235-ladsgroup.json
  • 16:08 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts puppetmaster2001.codfw.wmnet
  • 16:08 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster2001.codfw.wmnet
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T335838)', diff saved to https://phabricator.wikimedia.org/P47428 and previous config saved to /var/cache/conftool/dbconfig/20230503-160601-ladsgroup.json
  • 16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T335838)', diff saved to https://phabricator.wikimedia.org/P47427 and previous config saved to /var/cache/conftool/dbconfig/20230503-160146-ladsgroup.json
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47426 and previous config saved to /var/cache/conftool/dbconfig/20230503-160039-ladsgroup.json
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T335838)', diff saved to https://phabricator.wikimedia.org/P47425 and previous config saved to /var/cache/conftool/dbconfig/20230503-155946-ladsgroup.json
  • 15:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47424 and previous config saved to /var/cache/conftool/dbconfig/20230503-155506-ladsgroup.json
  • 15:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47423 and previous config saved to /var/cache/conftool/dbconfig/20230503-155221-ladsgroup.json
  • 15:48 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[13-27].codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P47422 and previous config saved to /var/cache/conftool/dbconfig/20230503-154639-ladsgroup.json
  • 15:42 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 15:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 15:41 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 15:40 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs2011.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:40 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 15:40 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P47421 and previous config saved to /var/cache/conftool/dbconfig/20230503-154000-ladsgroup.json
  • 15:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 15:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2011 - pt1979@cumin2002"
  • 15:37 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts puppetmaster1002.eqiad.wmnet
  • 15:37 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster1002.eqiad.wmnet
  • 15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P47420 and previous config saved to /var/cache/conftool/dbconfig/20230503-153715-ladsgroup.json
  • 15:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for lvs2011 - pt1979@cumin2002"
  • 15:34 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:34 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P47419 and previous config saved to /var/cache/conftool/dbconfig/20230503-153133-ladsgroup.json
  • 15:29 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P47418 and previous config saved to /var/cache/conftool/dbconfig/20230503-152453-ladsgroup.json
  • 15:24 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P47417 and previous config saved to /var/cache/conftool/dbconfig/20230503-152208-ladsgroup.json
  • 15:17 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T335838)', diff saved to https://phabricator.wikimedia.org/P47416 and previous config saved to /var/cache/conftool/dbconfig/20230503-151627-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T335838)', diff saved to https://phabricator.wikimedia.org/P47415 and previous config saved to /var/cache/conftool/dbconfig/20230503-151013-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T335838)', diff saved to https://phabricator.wikimedia.org/P47414 and previous config saved to /var/cache/conftool/dbconfig/20230503-150947-ladsgroup.json
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47413 and previous config saved to /var/cache/conftool/dbconfig/20230503-150947-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47412 and previous config saved to /var/cache/conftool/dbconfig/20230503-150702-ladsgroup.json
  • 15:07 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:03 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2012.codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47411 and previous config saved to /var/cache/conftool/dbconfig/20230503-150103-ladsgroup.json
  • 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47410 and previous config saved to /var/cache/conftool/dbconfig/20230503-150042-ladsgroup.json
  • 15:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T335838)', diff saved to https://phabricator.wikimedia.org/P47409 and previous config saved to /var/cache/conftool/dbconfig/20230503-150017-ladsgroup.json
  • 14:59 sukhe: fix backup route for high-traffic2 in codfw: set routing-options static route 208.80.153.240/28 next-hop 10.192.17.7
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P47408 and previous config saved to /var/cache/conftool/dbconfig/20230503-145440-ladsgroup.json
  • 14:54 sukhe: [finished] homer "cr*-codfw*" commit "Gerrit: 914344 remove decommissioned host lvs2007": T335777
  • 14:53 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2012.codfw.wmnet: Upgrade Cassandra — T335383 - eevans@cumin1001
  • 14:52 sukhe: homer "cr*-codfw*" commit "Gerrit: 914344 remove decommissioned host lvs2007": T335777
  • 14:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2007.codfw.wmnet
  • 14:46 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P47407 and previous config saved to /var/cache/conftool/dbconfig/20230503-144511-ladsgroup.json
  • 14:45 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm1001.wikimedia.org
  • 14:43 ottomata: Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in kafka logging clusters - T334733
  • 14:40 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:40 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt frav1003 - jclark@cumin1001"
  • 14:40 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm1001.wikimedia.org
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P47406 and previous config saved to /var/cache/conftool/dbconfig/20230503-143933-ladsgroup.json
  • 14:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt frav1003 - jclark@cumin1001"
  • 14:36 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:36 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs2007.codfw.wmnet
  • 14:33 sukhe: set routing-options static route 208.80.153.224/28 next-hop 10.192.49.7 [move static route for high-traffic1 to lvs2010]: T335777
  • 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P47405 and previous config saved to /var/cache/conftool/dbconfig/20230503-143005-ladsgroup.json
  • 14:26 ottomata: Grant IdempotentWrite Kafka Cluster ACL to User:ANONYOUS in kafka main clusters - T334733
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T335838)', diff saved to https://phabricator.wikimedia.org/P47404 and previous config saved to /var/cache/conftool/dbconfig/20230503-142427-ladsgroup.json
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T335838)', diff saved to https://phabricator.wikimedia.org/P47403 and previous config saved to /var/cache/conftool/dbconfig/20230503-141817-ladsgroup.json
  • 14:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 14:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47402 and previous config saved to /var/cache/conftool/dbconfig/20230503-141752-ladsgroup.json
  • 14:16 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767
  • 14:16 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 14:15 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 14:15 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 14:15 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 14:15 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T335838)', diff saved to https://phabricator.wikimedia.org/P47401 and previous config saved to /var/cache/conftool/dbconfig/20230503-141458-ladsgroup.json
  • 14:14 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 14:13 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:13 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 14:13 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 14:13 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 14:12 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:12 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:12 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:12 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:11 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 14:11 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:11 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 14:11 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:11 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:11 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 14:09 sukhe: stop pybal on lvs2007 to drain host for decommissioning
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T335838)', diff saved to https://phabricator.wikimedia.org/P47400 and previous config saved to /var/cache/conftool/dbconfig/20230503-140932-ladsgroup.json
  • 14:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:09 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Turn on experimental Parsoid Read Views support, except on commons & wikidata (T335157) (duration: 15m 27s)
  • 14:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47399 and previous config saved to /var/cache/conftool/dbconfig/20230503-140908-ladsgroup.json
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47398 and previous config saved to /var/cache/conftool/dbconfig/20230503-140540-ladsgroup.json
  • 14:04 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 14:04 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 14:03 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kafkamon2003.codfw.wmnet
  • 14:03 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kafkamon2003.codfw.wmnet - herron@cumin1001"
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P47396 and previous config saved to /var/cache/conftool/dbconfig/20230503-140246-ladsgroup.json
  • 14:02 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kafkamon2003.codfw.wmnet - herron@cumin1001"
  • 13:55 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and cscott: Backport for Turn on experimental Parsoid Read Views support, except on commons & wikidata (T335157) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P47395 and previous config saved to /var/cache/conftool/dbconfig/20230503-135402-ladsgroup.json
  • 13:53 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Turn on experimental Parsoid Read Views support, except on commons & wikidata (T335157)
  • 13:52 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for wblistentityusage: Deprecate wbeu prefix, new output format (T300460 T196962) (duration: 27m 54s)
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P47394 and previous config saved to /var/cache/conftool/dbconfig/20230503-135034-ladsgroup.json
  • 13:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw
  • 13:47 herron@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafkamon2003.codfw.wmnet on all recursors
  • 13:47 herron@cumin1001: START - Cookbook sre.dns.wipe-cache kafkamon2003.codfw.wmnet on all recursors
  • 13:47 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:47 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kafkamon2003.codfw.wmnet - herron@cumin1001"
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P47393 and previous config saved to /var/cache/conftool/dbconfig/20230503-134740-ladsgroup.json
  • 13:46 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kafkamon2003.codfw.wmnet - herron@cumin1001"
  • 13:43 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host kafkamon2003.codfw.wmnet
  • 13:43 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm1001.wikimedia.org
  • 13:42 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for wblistentityusage: Deprecate wbeu prefix, new output format (T300460 T196962) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:40 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm1001.wikimedia.org
  • 13:40 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm2001.wikimedia.org
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P47392 and previous config saved to /var/cache/conftool/dbconfig/20230503-133855-ladsgroup.json
  • 13:36 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm2001.wikimedia.org
  • 13:36 slyngshede@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM idm-test1001.wikimedia.org
  • 13:35 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kafkamon1003.eqiad.wmnet
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P47391 and previous config saved to /var/cache/conftool/dbconfig/20230503-133528-ladsgroup.json
  • 13:34 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
  • 13:34 slyngshede@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM idm-test1001.wikimedia.org
  • 13:33 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
  • 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47390 and previous config saved to /var/cache/conftool/dbconfig/20230503-133232-ladsgroup.json
  • 13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47389 and previous config saved to /var/cache/conftool/dbconfig/20230503-133117-ladsgroup.json
  • 13:26 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
  • 13:24 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for wblistentityusage: Deprecate wbeu prefix, new output format (T300460 T196962)
  • 13:24 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:24 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47388 and previous config saved to /var/cache/conftool/dbconfig/20230503-132349-ladsgroup.json
  • 13:20 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for testwikidatawiki: enable entity labels in parsed API edit summaries (T335098) (duration: 17m 55s)
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47387 and previous config saved to /var/cache/conftool/dbconfig/20230503-132022-ladsgroup.json
  • 13:19 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47386 and previous config saved to /var/cache/conftool/dbconfig/20230503-131736-ladsgroup.json
  • 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47385 and previous config saved to /var/cache/conftool/dbconfig/20230503-131656-ladsgroup.json
  • 13:16 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:16 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P47384 and previous config saved to /var/cache/conftool/dbconfig/20230503-131611-ladsgroup.json
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T335838)', diff saved to https://phabricator.wikimedia.org/P47383 and previous config saved to /var/cache/conftool/dbconfig/20230503-131414-ladsgroup.json
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T335838)', diff saved to https://phabricator.wikimedia.org/P47382 and previous config saved to /var/cache/conftool/dbconfig/20230503-131249-ladsgroup.json
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47381 and previous config saved to /var/cache/conftool/dbconfig/20230503-131224-ladsgroup.json
  • 13:05 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for testwikidatawiki: enable entity labels in parsed API edit summaries (T335098) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:02 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for testwikidatawiki: enable entity labels in parsed API edit summaries (T335098)
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P47380 and previous config saved to /var/cache/conftool/dbconfig/20230503-130149-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P47379 and previous config saved to /var/cache/conftool/dbconfig/20230503-130105-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P47378 and previous config saved to /var/cache/conftool/dbconfig/20230503-125718-ladsgroup.json
  • 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P47377 and previous config saved to /var/cache/conftool/dbconfig/20230503-124643-ladsgroup.json
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47376 and previous config saved to /var/cache/conftool/dbconfig/20230503-124558-ladsgroup.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P47375 and previous config saved to /var/cache/conftool/dbconfig/20230503-124212-ladsgroup.json
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T335838)', diff saved to https://phabricator.wikimedia.org/P47374 and previous config saved to /var/cache/conftool/dbconfig/20230503-123837-ladsgroup.json
  • 12:37 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T335838)', diff saved to https://phabricator.wikimedia.org/P47373 and previous config saved to /var/cache/conftool/dbconfig/20230503-123714-ladsgroup.json
  • 12:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T335838)', diff saved to https://phabricator.wikimedia.org/P47372 and previous config saved to /var/cache/conftool/dbconfig/20230503-123649-ladsgroup.json
  • 12:36 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 12:35 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 12:31 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47371 and previous config saved to /var/cache/conftool/dbconfig/20230503-123137-ladsgroup.json
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47370 and previous config saved to /var/cache/conftool/dbconfig/20230503-122705-ladsgroup.json
  • 12:25 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 12:24 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 12:24 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P47369 and previous config saved to /var/cache/conftool/dbconfig/20230503-122143-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T335838)', diff saved to https://phabricator.wikimedia.org/P47368 and previous config saved to /var/cache/conftool/dbconfig/20230503-122113-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47367 and previous config saved to /var/cache/conftool/dbconfig/20230503-122049-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T335838)', diff saved to https://phabricator.wikimedia.org/P47366 and previous config saved to /var/cache/conftool/dbconfig/20230503-122040-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T335838)', diff saved to https://phabricator.wikimedia.org/P47365 and previous config saved to /var/cache/conftool/dbconfig/20230503-122000-ladsgroup.json
  • 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
  • 12:15 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
  • 12:11 Amir1: Removing db1111 from zarcillo T335836
  • 12:09 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
  • 12:09 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P47364 and previous config saved to /var/cache/conftool/dbconfig/20230503-120637-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1111.eqiad.wmnet
  • 12:06 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1111.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P47363 and previous config saved to /var/cache/conftool/dbconfig/20230503-120536-ladsgroup.json
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P47362 and previous config saved to /var/cache/conftool/dbconfig/20230503-120453-ladsgroup.json
  • 12:02 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1111.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 11:56 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1111.eqiad.wmnet
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T335838)', diff saved to https://phabricator.wikimedia.org/P47361 and previous config saved to /var/cache/conftool/dbconfig/20230503-115130-ladsgroup.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1110 from dbctl T335011', diff saved to https://phabricator.wikimedia.org/P47360 and previous config saved to /var/cache/conftool/dbconfig/20230503-115124-marostegui.json
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P47359 and previous config saved to /var/cache/conftool/dbconfig/20230503-115030-ladsgroup.json
  • 11:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P47358 and previous config saved to /var/cache/conftool/dbconfig/20230503-114947-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T335838)', diff saved to https://phabricator.wikimedia.org/P47357 and previous config saved to /var/cache/conftool/dbconfig/20230503-114426-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T335838)', diff saved to https://phabricator.wikimedia.org/P47356 and previous config saved to /var/cache/conftool/dbconfig/20230503-114335-ladsgroup.json
  • 11:40 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2015.codfw.wmnet
  • 11:38 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2015.codfw.wmnet
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47355 and previous config saved to /var/cache/conftool/dbconfig/20230503-113524-ladsgroup.json
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T335838)', diff saved to https://phabricator.wikimedia.org/P47354 and previous config saved to /var/cache/conftool/dbconfig/20230503-113441-ladsgroup.json
  • 11:31 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:28 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P47353 and previous config saved to /var/cache/conftool/dbconfig/20230503-112828-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T335838)', diff saved to https://phabricator.wikimedia.org/P47352 and previous config saved to /var/cache/conftool/dbconfig/20230503-112819-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T335838)', diff saved to https://phabricator.wikimedia.org/P47351 and previous config saved to /var/cache/conftool/dbconfig/20230503-112819-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Remove db1111 from dbctl T335836', diff saved to https://phabricator.wikimedia.org/P47350 and previous config saved to /var/cache/conftool/dbconfig/20230503-112812-ladsgroup.json
  • 11:27 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 11:23 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47349 and previous config saved to /var/cache/conftool/dbconfig/20230503-112037-root.json
  • 11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T335838)', diff saved to https://phabricator.wikimedia.org/P47348 and previous config saved to /var/cache/conftool/dbconfig/20230503-111910-ladsgroup.json
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Decom db1111 T335836', diff saved to https://phabricator.wikimedia.org/P47347 and previous config saved to /var/cache/conftool/dbconfig/20230503-111904-ladsgroup.json
  • 11:18 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1017.eqiad.wmnet with reason: Upgrade
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy1017.eqiad.wmnet with reason: Upgrade
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1016.eqiad.wmnet with reason: Upgrade
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy1016.eqiad.wmnet with reason: Upgrade
  • 11:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1015.eqiad.wmnet with reason: Upgrade
  • 11:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy1015.eqiad.wmnet with reason: Upgrade
  • 11:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1014.eqiad.wmnet with reason: Upgrade
  • 11:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy1014.eqiad.wmnet with reason: Upgrade
  • 11:11 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P47346 and previous config saved to /var/cache/conftool/dbconfig/20230503-111145-ladsgroup.json
  • 11:08 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 11:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster1001.eqiad.wmnet
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47345 and previous config saved to /var/cache/conftool/dbconfig/20230503-110532-root.json
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P47344 and previous config saved to /var/cache/conftool/dbconfig/20230503-110357-ladsgroup.json
  • 11:02 marostegui: Reboot dbproxy200[1-4]
  • 11:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy[2001-2004].codfw.wmnet with reason: Reboot T335845
  • 11:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbproxy[2001-2004].codfw.wmnet with reason: Reboot T335845
  • 10:57 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster1001.eqiad.wmnet
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T335838)', diff saved to https://phabricator.wikimedia.org/P47343 and previous config saved to /var/cache/conftool/dbconfig/20230503-105639-ladsgroup.json
  • 10:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
  • 10:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
  • 10:51 claime: Migrating recommendation-api eqiad to mw-api-int-async - T334062
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47342 and previous config saved to /var/cache/conftool/dbconfig/20230503-105028-root.json
  • 10:50 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
  • 10:50 claime: Migrating recommendation-api codfw to mw-api-int-async - T334062
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T335838)', diff saved to https://phabricator.wikimedia.org/P47341 and previous config saved to /var/cache/conftool/dbconfig/20230503-105004-ladsgroup.json
  • 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 10:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T335838)', diff saved to https://phabricator.wikimedia.org/P47340 and previous config saved to /var/cache/conftool/dbconfig/20230503-104939-ladsgroup.json
  • 10:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P47339 and previous config saved to /var/cache/conftool/dbconfig/20230503-104851-ladsgroup.json
  • 10:47 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 10:45 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubestagemaster2001.codfw.wmnet
  • 10:45 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 10:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1005.eqiad.wmnet
  • 10:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2004.codfw.wmnet
  • 10:41 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 10:40 claime: Migrating recommendation-api staging to mw-api-int-async - T334062
  • 10:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2004.codfw.wmnet
  • 10:38 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1004.eqiad.wmnet
  • 10:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1005.eqiad.wmnet
  • 10:35 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for wblistentityusage: Deprecate wbeu prefix, new output format (T300460 T196962) (duration: 34m 53s)
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47338 and previous config saved to /var/cache/conftool/dbconfig/20230503-103523-root.json
  • 10:35 volans@cumin1001: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Release v3.2.9-wmf2 to netbox-next - volans@cumin1001
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P47337 and previous config saved to /var/cache/conftool/dbconfig/20230503-103433-ladsgroup.json
  • 10:34 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
  • 10:33 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubestagemaster2001.codfw.wmnet
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T335838)', diff saved to https://phabricator.wikimedia.org/P47336 and previous config saved to /var/cache/conftool/dbconfig/20230503-103345-ladsgroup.json
  • 10:33 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
  • 10:32 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1004.eqiad.wmnet
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T335838)', diff saved to https://phabricator.wikimedia.org/P47335 and previous config saved to /var/cache/conftool/dbconfig/20230503-102719-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T335838)', diff saved to https://phabricator.wikimedia.org/P47334 and previous config saved to /var/cache/conftool/dbconfig/20230503-102654-ladsgroup.json
  • 10:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2003.wikimedia.org
  • 10:21 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47333 and previous config saved to /var/cache/conftool/dbconfig/20230503-102018-root.json
  • 10:19 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P47332 and previous config saved to /var/cache/conftool/dbconfig/20230503-101926-ladsgroup.json
  • 10:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 10:18 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 10:18 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for wblistentityusage: Deprecate wbeu prefix, new output format (T300460 T196962) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 10:18 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab2003.wikimedia.org
  • 10:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
  • 10:16 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on aphlict1001.eqiad.wmnet with reason: aphlict1002 is now active
  • 10:16 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on aphlict1001.eqiad.wmnet with reason: aphlict1002 is now active
  • 10:13 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P47331 and previous config saved to /var/cache/conftool/dbconfig/20230503-101147-ladsgroup.json
  • 10:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
  • 10:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
  • 10:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
  • 10:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
  • 10:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2004.codfw.wmnet
  • 10:07 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
  • 10:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47330 and previous config saved to /var/cache/conftool/dbconfig/20230503-100513-root.json
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T335838)', diff saved to https://phabricator.wikimedia.org/P47329 and previous config saved to /var/cache/conftool/dbconfig/20230503-100420-ladsgroup.json
  • 10:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
  • 10:02 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
  • 10:00 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
  • 10:00 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for wblistentityusage: Deprecate wbeu prefix, new output format (T300460 T196962)
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T335838)', diff saved to https://phabricator.wikimedia.org/P47328 and previous config saved to /var/cache/conftool/dbconfig/20230503-095901-ladsgroup.json
  • 09:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 09:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P47327 and previous config saved to /var/cache/conftool/dbconfig/20230503-095641-ladsgroup.json
  • 09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 09:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Cloning db1110 from db1217:3323 T335092
  • 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Cloning db1110 from db1217:3323 T335092
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 3%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47325 and previous config saved to /var/cache/conftool/dbconfig/20230503-095008-root.json
  • 09:49 urbanecm@deploy1002: Finished scap: Backport for Personalized praise: Run convertNumber() before displaying numbers (T322443), Personalized praise: Run convertNumber() before displaying numbers (T322443) (duration: 06m 53s)
  • 09:47 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 09:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
  • 09:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 4:00:00 on db1110.eqiad.wmnet with reason: Moving to m3 T335092
  • 09:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 4:00:00 on db1110.eqiad.wmnet with reason: Moving to m3 T335092
  • 09:42 urbanecm@deploy1002: Started scap: Backport for Personalized praise: Run convertNumber() before displaying numbers (T322443), Personalized praise: Run convertNumber() before displaying numbers (T322443)
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T335838)', diff saved to https://phabricator.wikimedia.org/P47324 and previous config saved to /var/cache/conftool/dbconfig/20230503-094135-ladsgroup.json
  • 09:36 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T335838)', diff saved to https://phabricator.wikimedia.org/P47323 and previous config saved to /var/cache/conftool/dbconfig/20230503-093606-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: Repooling after migrating', diff saved to https://phabricator.wikimedia.org/P47322 and previous config saved to /var/cache/conftool/dbconfig/20230503-093503-root.json
  • 09:29 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.2 - volans@cumin1001
  • 09:29 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 100%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47321 and previous config saved to /var/cache/conftool/dbconfig/20230503-092856-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 100%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47320 and previous config saved to /var/cache/conftool/dbconfig/20230503-092847-root.json
  • 09:28 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.2 - volans@cumin1001
  • 09:26 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124', diff saved to https://phabricator.wikimedia.org/P47319 and previous config saved to /var/cache/conftool/dbconfig/20230503-092513-root.json
  • 09:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2124.codfw.wmnet with reason: Migrating to 10.6 and rebooting
  • 09:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
  • 09:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2124.codfw.wmnet with reason: Migrating to 10.6 and rebooting
  • 09:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
  • 09:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
  • 09:20 urbanecm@deploy1002: Finished scap: Backport for [Growth] Add GEMentorDashboardEnabledModules (T334630) (duration: 06m 56s)
  • 09:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
  • 09:17 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
  • 09:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
  • 09:14 urbanecm@deploy1002: Started scap: Backport for [Growth] Add GEMentorDashboardEnabledModules (T334630)
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 75%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47318 and previous config saved to /var/cache/conftool/dbconfig/20230503-091352-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 75%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47317 and previous config saved to /var/cache/conftool/dbconfig/20230503-091342-root.json
  • 09:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2001.codfw.wmnet
  • 09:11 urbanecm@deploy1002: sync-world aborted: Backport for Personalized praise: Let mentors to skip suggestions (T334300) (duration: 00m 06s)
  • 09:11 urbanecm@deploy1002: Started scap: Backport for Personalized praise: Let mentors to skip suggestions (T334300)
  • 09:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
  • 09:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
  • 09:02 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 09:01 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host acmechief1001.eqiad.wmnet
  • 09:01 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 09:00 urbanecm@deploy1002: Finished scap: Backport for Personalized praise: Let mentors to skip suggestions (T334300) (duration: 27m 39s)
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 50%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47316 and previous config saved to /var/cache/conftool/dbconfig/20230503-085847-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 50%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47315 and previous config saved to /var/cache/conftool/dbconfig/20230503-085837-root.json
  • 08:44 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 25%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47314 and previous config saved to /var/cache/conftool/dbconfig/20230503-084342-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 25%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47313 and previous config saved to /var/cache/conftool/dbconfig/20230503-084332-root.json
  • 08:39 marostegui: dbmaint deploy schema change on eqiad s3 with replication T335834
  • 08:39 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 08:32 urbanecm@deploy1002: Started scap: Backport for Personalized praise: Let mentors to skip suggestions (T334300)
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 10%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47312 and previous config saved to /var/cache/conftool/dbconfig/20230503-082837-root.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 10%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47311 and previous config saved to /var/cache/conftool/dbconfig/20230503-082827-root.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 5%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47310 and previous config saved to /var/cache/conftool/dbconfig/20230503-081332-root.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 5%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47309 and previous config saved to /var/cache/conftool/dbconfig/20230503-081323-root.json
  • 08:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 4%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47308 and previous config saved to /var/cache/conftool/dbconfig/20230503-075828-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 4%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47307 and previous config saved to /var/cache/conftool/dbconfig/20230503-075818-root.json
  • 07:57 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 07:48 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 3%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47306 and previous config saved to /var/cache/conftool/dbconfig/20230503-074323-root.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 3%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47305 and previous config saved to /var/cache/conftool/dbconfig/20230503-074313-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 T335011', diff saved to https://phabricator.wikimedia.org/P47304 and previous config saved to /var/cache/conftool/dbconfig/20230503-073602-root.json
  • 07:28 taavi@deploy1002: Finished scap: Backport for Remove duplicated diff-mode selector in save dialog (T324759) (duration: 10m 14s)
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 2%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47303 and previous config saved to /var/cache/conftool/dbconfig/20230503-072818-root.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 2%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47302 and previous config saved to /var/cache/conftool/dbconfig/20230503-072808-root.json
  • 07:26 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 07:20 taavi@deploy1002: taavi and samwilson: Backport for Remove duplicated diff-mode selector in save dialog (T324759) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:18 taavi@deploy1002: Started scap: Backport for Remove duplicated diff-mode selector in save dialog (T324759)
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 1%: Pooling db1213:3316 T326669', diff saved to https://phabricator.wikimedia.org/P47299 and previous config saved to /var/cache/conftool/dbconfig/20230503-071313-root.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 1%: Pooling db1213:3315 T326669', diff saved to https://phabricator.wikimedia.org/P47298 and previous config saved to /var/cache/conftool/dbconfig/20230503-071303-root.json
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1213 (s5,s6) to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P47297 and previous config saved to /var/cache/conftool/dbconfig/20230503-071046-marostegui.json
  • 07:09 moritzm: installing glibc bugfix updates from bullseye point release
  • 07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1117.eqiad.wmnet
  • 07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1117.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:01 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1117.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:56 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:50 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1117.eqiad.wmnet
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 38 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:46 marostegui: Disconnect codfw -> eqiad replication on s1 T335267
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 38 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 06:28 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 34 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:14 marostegui: Disconnect codfw -> eqiad replication on s8 T335267
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 34 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 35 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:09 marostegui: Disconnect codfw -> eqiad replication on s4 T335267
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 35 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 28 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:06 marostegui: Disconnect codfw -> eqiad replication on s7 T335267
  • 06:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 28 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 24 hosts with reason: Disconnecting codfw > eqiad T335267
  • 06:01 marostegui: Disconnect codfw -> eqiad replication on s3 T335267
  • 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 24 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 26 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:59 marostegui: Disconnect codfw -> eqiad replication on s5 T335267
  • 05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 26 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:57 marostegui: Disconnect codfw -> eqiad replication on s2 T335267
  • 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:54 marostegui: Disconnect codfw -> eqiad replication on s6 T335267
  • 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 6 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:51 marostegui: Disconnect codfw -> eqiad replication on es5 T335267
  • 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 6 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 6 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:48 marostegui: Disconnect codfw -> eqiad replication on es4 T335267
  • 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 6 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 10 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:44 marostegui: Disconnect codfw -> eqiad replication on x1 T335267
  • 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 10 hosts with reason: Disconnecting codfw > eqiad T335267
  • 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
  • 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
  • 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
  • 05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
  • 05:40 marostegui: Disconnect codfw -> eqiad replication on pc3 T335267
  • 05:40 marostegui: Disconnect codfw -> eqiad replication on pc2 T335267
  • 05:40 marostegui: Disconnect codfw -> eqiad replication on pc1 T335267
  • 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
  • 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: Disconnecting codfw > eqiad T335267
  • 01:24 eileen: civicrm upgraded from 09d2eefd to c6149ad2
  • 00:58 ejegg: civicrm upgraded from 8426761b to 09d2eefd
  • 00:47 sukhe: restart haproxy on cp2031: T334448
  • 00:42 eileen: civicrm upgraded from 8076995a to 8426761b

2023-05-02

  • 23:28 eileen: config revision changed from 18acbe1a to 69f60bb9
  • 22:56 eileen: config revision changed from 2eef4039 to 18acbe1a
  • 22:53 eileen: civicrm upgraded from a3d84de3 to 8076995a
  • 20:36 urbanecm@deploy1002: Finished scap: Backport for [Growth] Finish Personalized praise variable rename (T334630) (duration: 06m 55s)
  • 20:30 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 20:30 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 20:29 urbanecm@deploy1002: Started scap: Backport for [Growth] Finish Personalized praise variable rename (T334630)
  • 20:25 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@daf8c32]: bump mjolnir to v2.3.0 (duration: 00m 28s)
  • 20:24 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@daf8c32]: bump mjolnir to v2.3.0
  • 20:20 urbanecm@deploy1002: Finished scap: Backport for Switch on creating Babel categories in Russian Wiktionary (T335136) (duration: 15m 47s)
  • 20:06 urbanecm@deploy1002: urbanecm and iniquity: Backport for Switch on creating Babel categories in Russian Wiktionary (T335136) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:04 urbanecm@deploy1002: Started scap: Backport for Switch on creating Babel categories in Russian Wiktionary (T335136)
  • 19:40 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 19:40 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 19:28 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:28 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:18 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: (no justification provided) (duration: 00m 05s)
  • 19:18 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: (no justification provided)
  • 19:13 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: (no justification provided) (duration: 00m 16s)
  • 19:12 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: (no justification provided)
  • 19:12 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: (no justification provided) (duration: 07m 13s)
  • 19:05 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: (no justification provided)
  • 19:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 19:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 18:56 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: (no justification provided) (duration: 00m 03s)
  • 18:56 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: (no justification provided)
  • 18:56 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: (no justification provided) (duration: 00m 19s)
  • 18:55 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: (no justification provided)
  • 18:50 bking@cumin1001: conftool action : set/pooled=inactive; selector: name=wdqs2022.codfw.wmnet
  • 18:38 ejegg: civicrm upgraded from e7904ea6 to a3d84de3
  • 18:19 milimetric@deploy1002: Finished deploy [analytics/refinery@c42021f] (thin): Regular analytics weekly train THIN [analytics/refinery@c42021f] (duration: 00m 07s)
  • 18:19 milimetric@deploy1002: Started deploy [analytics/refinery@c42021f] (thin): Regular analytics weekly train THIN [analytics/refinery@c42021f]
  • 18:11 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.7 refs T330213
  • 18:03 brennen: train 1.41.0-wmf.7 (T330213): (correction: group0 today)
  • 18:01 brennen: train 1.41.0-wmf.7 (T330213): no current blockers, rolling to group1 with `scap train`
  • 17:45 milimetric@deploy1002: Finished deploy [analytics/refinery@c42021f]: Regular analytics weekly train [analytics/refinery@c42021f] (duration: 06m 26s)
  • 17:39 milimetric@deploy1002: Started deploy [analytics/refinery@c42021f]: Regular analytics weekly train [analytics/refinery@c42021f]
  • 17:36 sukhe: cr*-eqiad: delete backup routes for ns0: delete routing-options static route 208.80.153.231/32: T330670
  • 17:34 sukhe: [correction] cr*-codfw: delete backup routes for ns0: delete routing-options static route 208.80.154.238/32: T330670
  • 17:32 sukhe: cr*-codfw: delete backup routes for ns1: delete routing-options static route 208.80.154.238/32: T330670
  • 17:28 sukhe: ns1 set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.77 208.80.153.111 208.80.153.10 ]: T330670
  • 17:25 sukhe: ns0 set routing-options static route 208.80.154.238/32 next-hop [ 208.80.154.10 208.80.155.108 208.80.154.134 ]: T330670
  • 17:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 16:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 16:38 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 16:24 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 16:16 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@bb96aca]: Add snappy dependency for kafka daemons (duration: 00m 26s)
  • 16:16 sukhe: ns0 backup routes: delete routing-options static route 208.80.154.238/32 next-hop 208.80.153.111, set to 208.80.153.77
  • 16:16 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@bb96aca]: Add snappy dependency for kafka daemons
  • 16:12 sukhe: ns1: delete routing-options static route 208.80.153.231/32 next-hop 208.80.153.111, set to 208.80.153.77
  • 16:11 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 16:10 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 16:08 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 16:06 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 15:39 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:36 claime: Re-running puppet on failed parse servers - T313227
  • 15:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 15:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:34 bking@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 15:33 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:16 jiji@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: codfw row C switches upgrade - T334049
  • 15:13 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:12 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:09 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:04 claime: enabling puppet on parse2014
  • 15:04 claime: enabling puppet on parse2013
  • 15:02 akosiaris: enable puppet on parse1005
  • 15:00 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:59 jiji@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: codfw row C switches upgrade - T334049
  • 14:59 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:58 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:56 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:55 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:54 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:53 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:52 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:40 moritzm: installing intel-microcode security updates on bullseye servers
  • 14:40 akosiaris: emergency disabling of puppet on parse hosts
  • 14:33 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: sync
  • 14:33 claime: Merging new internal certs for api, jobrunner, appservers, parsoid - T313227
  • 14:29 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: sync
  • 14:27 denisse: sync prometheus3001 -> prometheus3002
  • 14:27 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 14:23 _joe_: also on contint1002, the current ci master
  • 14:22 _joe_: restarted zuul on contint2001
  • 14:07 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 13:51 sukhe: run authdns-update to repool codfw
  • 13:47 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid
  • 13:47 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver
  • 13:47 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver
  • 13:45 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 13:37 urbanecm@deploy1002: Finished scap: Backport for Enable Kartographer Nearby on mobile (T333137), Fix clearing wrong container when closing fullscreen map (T335648), Fix clearing wrong container when closing fullscreen map (T335648) (duration: 14m 54s)
  • 13:25 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 13:24 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2005.wikimedia.org
  • 13:24 urbanecm@deploy1002: wmde-fisch and urbanecm: Backport for Enable Kartographer Nearby on mobile (T333137), Fix clearing wrong container when closing fullscreen map (T335648), Fix clearing wrong container when closing fullscreen map (T335648) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:22 urbanecm@deploy1002: Started scap: Backport for Enable Kartographer Nearby on mobile (T333137), Fix clearing wrong container when closing fullscreen map (T335648), Fix clearing wrong container when closing fullscreen map (T335648)
  • 13:16 urbanecm@deploy1002: Sync cancelled.
  • 13:16 urbanecm@deploy1002: urbanecm and wmde-fisch: Backport for Enable Kartographer Nearby on mobile (T333137) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:07 urbanecm@deploy1002: Started scap: Backport for Enable Kartographer Nearby on mobile (T333137)
  • 13:05 XioNoX: rebooting asw-c-codfw for software upgrade - T334049
  • 13:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 185 hosts with reason: codfw row C upgrade
  • 13:01 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 185 hosts with reason: codfw row C upgrade
  • 12:54 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on 186 hosts with reason: codfw row C upgrade
  • 12:54 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 186 hosts with reason: codfw row C upgrade
  • 12:31 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance
  • 12:31 eevans@cumin1001: START - Cookbook sre.discovery.service-route check 2 services: maintenance
  • 12:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:29 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:28 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:24 moritzm: installing LInux 5.10.178 on bullseye hosts
  • 12:20 sukhe: run authdns-update to depool codfwL T334049
  • 12:17 Amir1: stop slave on eqiad masters of s1, x1, s8 (T334049)
  • 12:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2005.wikimedia.org
  • 12:05 Amir1: stop slave again on db1130 (eqiad master of s5) (T334049)
  • 12:03 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1011
  • 12:03 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host backup1011
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2003.codfw.wmnet
  • 11:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host poolcounter2003.codfw.wmnet
  • 11:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1011
  • 11:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host backup1011
  • 11:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1010
  • 11:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host backup1010
  • 11:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetmaster1006
  • 11:51 Amir1: stop slave on db1130 (eqiad master of s5) (T334049)
  • 11:51 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:50 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host puppetmaster1006
  • 11:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1003
  • 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2004.codfw.wmnet
  • 11:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 41 hosts with reason: Row c switch maint T334049
  • 11:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 41 hosts with reason: Row c switch maint T334049
  • 11:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest1003
  • 11:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host poolcounter2004.codfw.wmnet
  • 11:32 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:30 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 10:52 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in codfw: codfw row C switches upgrade - T334049
  • 10:47 ladsgroup@deploy1002: Finished scap: Backport for Set externallinks migration to read new in testwiki (T335343) (duration: 13m 27s)
  • 10:35 ladsgroup@deploy1002: ladsgroup: Backport for Set externallinks migration to read new in testwiki (T335343) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 10:33 ladsgroup@deploy1002: Started scap: Backport for Set externallinks migration to read new in testwiki (T335343)
  • 10:00 ladsgroup@deploy1002: Finished scap: Backport for Remove 1024px and 1920px from pre-gen thumbsizes (T211661) (duration: 08m 40s)
  • 09:59 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in codfw: codfw row C switches upgrade - T334049
  • 09:53 ladsgroup@deploy1002: ladsgroup: Backport for Remove 1024px and 1920px from pre-gen thumbsizes (T211661) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 09:51 ladsgroup@deploy1002: Started scap: Backport for Remove 1024px and 1920px from pre-gen thumbsizes (T211661)
  • 09:21 eoghan@cumin1001: END (ERROR) - Cookbook sre.gitlab.failover (exit_code=97) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
  • 09:13 ladsgroup@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/local/bin/update-mediawiki-tools-release' returned non-zero exit status 1. (duration: 00m 05s)
  • 09:12 ladsgroup@deploy1002: Started scap: Backport for Remove 1024px and 1920px from pre-gen thumbsizes (T211661)
  • 09:10 eoghan@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
  • 08:51 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:44 vgutierrez: testing haproxy 2.6.12-1~bpo10+1+wmf1 in cp1077 and cp1085 - T334448
  • 08:40 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:28 moritzm: updated netboot image for Bullseye 11.7 T335575
  • 08:27 XioNoX: stage Junos 21 on asw-c-codfw - T334049
  • 08:07 godog: upgrade grafana to 9.3.13
  • 07:49 tgr_: UTC morning deploys done
  • 07:48 tgr@deploy1002: Finished scap: Backport for OAuth: Do not require approval for read-only grants on public wikis (T67750) (duration: 07m 39s)
  • 07:42 tgr@deploy1002: tgr: Backport for OAuth: Do not require approval for read-only grants on public wikis (T67750) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 07:40 tgr@deploy1002: Started scap: Backport for OAuth: Do not require approval for read-only grants on public wikis (T67750)
  • 07:38 tgr@deploy1002: Finished scap: Backport for [noop] Disable section image recommendations in production (T329276) (duration: 07m 29s)
  • 07:32 tgr@deploy1002: tgr: Backport for [noop] Disable section image recommendations in production (T329276) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:30 tgr@deploy1002: Started scap: Backport for [noop] Disable section image recommendations in production (T329276)
  • 07:11 taavi@deploy1002: Finished scap: Backport for Enable WikiLove extension on bnwikibooks (T335705) (duration: 07m 59s)
  • 07:05 taavi@deploy1002: taavi and mdsshakil: Backport for Enable WikiLove extension on bnwikibooks (T335705) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:03 taavi@deploy1002: Started scap: Backport for Enable WikiLove extension on bnwikibooks (T335705)
  • 06:57 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Cmjohnson out of all services on: 794 hosts
  • 06:57 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Cmjohnson out of all services on: 794 hosts
  • 06:57 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Cmjohnson out of all services on: 1274 hosts
  • 06:56 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Cmjohnson out of all services on: 1274 hosts
  • 06:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 136106
  • 06:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 136106
  • 06:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 293
  • 06:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 293
  • 06:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 48237
  • 06:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 48237
  • 06:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 132132
  • 06:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 132132
  • 06:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10089
  • 06:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10089
  • 05:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 17961
  • 05:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 17961
  • 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.5 (duration: 02m 17s)
  • 03:52 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.7 refs T330213 (duration: 49m 21s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.7 refs T330213

2023-05-01

  • 22:24 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 22:23 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 22:23 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 22:23 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 22:23 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 22:22 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 22:22 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 22:22 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 22:22 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 22:21 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 22:19 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 22:18 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 22:18 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 22:17 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 22:17 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 22:16 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 22:16 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 22:16 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 22:15 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 22:14 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 21:57 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 21:56 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 21:56 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 21:56 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 21:56 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 21:56 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 21:55 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 21:55 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 21:55 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 21:55 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 21:47 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 21:47 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 21:47 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 21:47 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 21:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 17 hosts with reason: T334049 maint
  • 21:15 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 17 hosts with reason: T334049 maint
  • 21:08 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic[2045-2048,2059,2065-2066,2071,2081-2083]* for row C switch upgrade - bking@cumin1001 - T334049
  • 21:08 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2045-2048,2059,2065-2066,2071,2081-2083]* for row C switch upgrade - bking@cumin1001 - T334049
  • 21:08 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic[2045-2048,2059,2065-2066,2071,2081-2083] for row C switch upgrade - bking@cumin1001 - T334049
  • 21:08 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic[2045-2048,2059,2065-2066,2071,2081-2083] for row C switch upgrade - bking@cumin1001 - T334049
  • 21:02 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in eqiad: maintenance
  • 20:57 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in eqiad: maintenance
  • 20:47 urandom: upgrading sessionstore1003 to Cassandra 3.11.14 — T335383
  • 20:45 urandom: upgrading sessionstore1002 to Cassandra 3.11.14 — T335383
  • 20:42 urandom: upgrading sessionstore1001 to Cassandra 3.11.14 — T335383
  • 20:38 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in eqiad: maintenance
  • 20:33 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in eqiad: maintenance
  • 20:13 taavi@deploy1002: Finished scap: Backport for Point SyntaxHighlight at /srv/app/pygmentize (T320848) (duration: 08m 12s)
  • 20:06 taavi@deploy1002: legoktm and taavi: Backport for Point SyntaxHighlight at /srv/app/pygmentize (T320848) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:05 taavi@deploy1002: Started scap: Backport for Point SyntaxHighlight at /srv/app/pygmentize (T320848)
  • 19:42 dancy@deploy1002: Installation of scap version "4.52.0" completed for 593 hosts
  • 19:41 dancy@deploy1002: Installing scap version "4.52.0" for 593 hosts
  • 19:06 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:06 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:58 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:58 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 18:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:32 sukhe: run authdns-update for CR 913966
  • 16:54 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance
  • 16:49 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance
  • 16:44 urbanecm@deploy1002: Finished scap: Backport for dewiki: Deploy Growth features to 100% of newcomers (T335385) (duration: 07m 22s)
  • 16:38 urbanecm@deploy1002: urbanecm: Backport for dewiki: Deploy Growth features to 100% of newcomers (T335385) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 16:37 urbanecm@deploy1002: Started scap: Backport for dewiki: Deploy Growth features to 100% of newcomers (T335385)
  • 16:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 16:26 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
  • 16:22 urandom: upgrading sessionstore2003 to Cassandra 3.11.14 — T335383
  • 16:19 urandom: upgrading sessionstore2002 to Cassandra 3.11.14 — T335383
  • 16:03 urandom: upgrading sessionstore2001 to Cassandra 3.11.14 — T335383
  • 15:59 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: maintenance
  • 15:54 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
  • 14:58 sukhe: restart haproxy on cp1077: T334448
  • 14:09 sukhe: move backup routes for ns0 from dns2001 to dns2002: T334049
  • 14:04 sukhe: move ns1 from dns2001 to dns2002: T334049
  • 13:19 taavi@deploy1002: Finished scap: Backport for [slwiki] Enable VisualEditor on Draft and Project namespaces (T335208), [frwikibooks] Change the logo for Vector legacy and add a wordmark for Vector 2022 (T335642), Close nawiki (T335674) (duration: 07m 59s)
  • 13:12 taavi@deploy1002: superpes and taavi: Backport for [slwiki] Enable VisualEditor on Draft and Project namespaces (T335208), [frwikibooks] Change the logo for Vector legacy and add a wordmark for Vector 2022 (T335642), Close nawiki (T335674) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:11 taavi@deploy1002: Started scap: Backport for [slwiki] Enable VisualEditor on Draft and Project namespaces (T335208), [frwikibooks] Change the logo for Vector legacy and add a wordmark for Vector 2022 (T335642), Close nawiki (T335674)
  • 11:41 zabe@deploy1002: Finished scap: Backport for Start writing to af_actor/afh_actor in group0 wikis (T334295) (duration: 21m 08s)
  • 11:31 zabe@deploy1002: zabe: Backport for Start writing to af_actor/afh_actor in group0 wikis (T334295) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 11:20 zabe@deploy1002: Started scap: Backport for Start writing to af_actor/afh_actor in group0 wikis (T334295)
  • 11:20 zabe@deploy1002: sync-world aborted: Backport for Start writing to af_actor/afh_actor in group0 wikis (T334295) (duration: 02m 33s)
  • 11:17 zabe@deploy1002: Started scap: Backport for Start writing to af_actor/afh_actor in group0 wikis (T334295)

2000s

2010s

2020s