Jump to content

Server Admin Log

From Wikitech

2025-06-23

  • 07:42 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 07:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 07:37 hashar@deploy1003: Finished scap sync-world: Backport for ApiQueryZFunctionReference: Return an actual empty array instead of [false] (T396978), captureSpeedtest: Drop PHP 7 check, no longer needed, diffConfig: Add a quick list of affected wikis to the end of the output (duration: 41m 07s)
  • 07:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T396130)', diff saved to https://phabricator.wikimedia.org/P78586 and previous config saved to /var/cache/conftool/dbconfig/20250623-073316-marostegui.json
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78585 and previous config saved to /var/cache/conftool/dbconfig/20250623-073145-root.json
  • 07:31 stevemunene@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T396130)', diff saved to https://phabricator.wikimedia.org/P78584 and previous config saved to /var/cache/conftool/dbconfig/20250623-072542-marostegui.json
  • 07:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T396130)', diff saved to https://phabricator.wikimedia.org/P78583 and previous config saved to /var/cache/conftool/dbconfig/20250623-072519-marostegui.json
  • 07:25 stevemunene@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 07:24 hashar@deploy1003: hashar, jforrester: Continuing with sync
  • 07:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
  • 07:18 hashar@deploy1003: hashar, jforrester: Backport for ApiQueryZFunctionReference: Return an actual empty array instead of [false] (T396978), captureSpeedtest: Drop PHP 7 check, no longer needed, diffConfig: Add a quick list of affected wikis to the end of the output synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ver
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78582 and previous config saved to /var/cache/conftool/dbconfig/20250623-071639-root.json
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78581 and previous config saved to /var/cache/conftool/dbconfig/20250623-071618-root.json
  • 07:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P78580 and previous config saved to /var/cache/conftool/dbconfig/20250623-071011-marostegui.json
  • 07:06 marostegui: Failover m5 from db1228 to db1164 - T397413
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P78579 and previous config saved to /var/cache/conftool/dbconfig/20250623-070134-root.json
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78578 and previous config saved to /var/cache/conftool/dbconfig/20250623-070112-root.json
  • 06:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2235].codfw.wmnet,db[1164,1217,1228].eqiad.wmnet with reason: m5 master switch T397413
  • 06:56 hashar@deploy1003: Started scap sync-world: Backport for ApiQueryZFunctionReference: Return an actual empty array instead of [false] (T396978), captureSpeedtest: Drop PHP 7 check, no longer needed, diffConfig: Add a quick list of affected wikis to the end of the output
  • 06:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P78577 and previous config saved to /var/cache/conftool/dbconfig/20250623-065503-marostegui.json
  • 06:48 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2235].codfw.wmnet,db[1164,1217,1228].eqiad.wmnet with reason: m5 master switch T397413
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P78576 and previous config saved to /var/cache/conftool/dbconfig/20250623-064628-root.json
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78575 and previous config saved to /var/cache/conftool/dbconfig/20250623-064606-root.json
  • 06:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78574 and previous config saved to /var/cache/conftool/dbconfig/20250623-064358-root.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T396130)', diff saved to https://phabricator.wikimedia.org/P78573 and previous config saved to /var/cache/conftool/dbconfig/20250623-063956-marostegui.json
  • 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T396130)', diff saved to https://phabricator.wikimedia.org/P78572 and previous config saved to /var/cache/conftool/dbconfig/20250623-063217-marostegui.json
  • 06:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T396130)', diff saved to https://phabricator.wikimedia.org/P78571 and previous config saved to /var/cache/conftool/dbconfig/20250623-063155-marostegui.json
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78570 and previous config saved to /var/cache/conftool/dbconfig/20250623-063123-root.json
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78569 and previous config saved to /var/cache/conftool/dbconfig/20250623-063100-root.json
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2039 T397599', diff saved to https://phabricator.wikimedia.org/P78568 and previous config saved to /var/cache/conftool/dbconfig/20250623-063050-marostegui.json
  • 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2038 to es7 primary and set section read-write T397599', diff saved to https://phabricator.wikimedia.org/P78567 and previous config saved to /var/cache/conftool/dbconfig/20250623-062949-marostegui.json
  • 06:29 marostegui: Starting es7 codfw failover from es2039 to es2038 - T397599
  • 06:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78566 and previous config saved to /var/cache/conftool/dbconfig/20250623-062852-root.json
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Set es2038 with weight 0 T397599', diff saved to https://phabricator.wikimedia.org/P78565 and previous config saved to /var/cache/conftool/dbconfig/20250623-062420-root.json
  • 06:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T397599
  • 06:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: T397597
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P78564 and previous config saved to /var/cache/conftool/dbconfig/20250623-061648-marostegui.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P78563 and previous config saved to /var/cache/conftool/dbconfig/20250623-061554-root.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2035 T397597', diff saved to https://phabricator.wikimedia.org/P78562 and previous config saved to /var/cache/conftool/dbconfig/20250623-061511-marostegui.json
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2037 to es6 primary and set section read-write T397597', diff saved to https://phabricator.wikimedia.org/P78561 and previous config saved to /var/cache/conftool/dbconfig/20250623-061416-marostegui.json
  • 06:13 marostegui: Starting es6 codfw failover from es2035 to es2037 - T397597
  • 06:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78560 and previous config saved to /var/cache/conftool/dbconfig/20250623-061346-root.json
  • 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'Set es2037 with weight 0 T397597', diff saved to https://phabricator.wikimedia.org/P78559 and previous config saved to /var/cache/conftool/dbconfig/20250623-061143-root.json
  • 06:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: T397597
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P78558 and previous config saved to /var/cache/conftool/dbconfig/20250623-060140-marostegui.json
  • 05:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78557 and previous config saved to /var/cache/conftool/dbconfig/20250623-055840-root.json
  • 05:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: T397597
  • 05:48 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 05:47 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 05:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2215 T397419', diff saved to https://phabricator.wikimedia.org/P78556 and previous config saved to /var/cache/conftool/dbconfig/20250623-054725-marostegui.json
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T396130)', diff saved to https://phabricator.wikimedia.org/P78555 and previous config saved to /var/cache/conftool/dbconfig/20250623-054633-marostegui.json
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2196 to x1 primary T397419', diff saved to https://phabricator.wikimedia.org/P78554 and previous config saved to /var/cache/conftool/dbconfig/20250623-054616-marostegui.json
  • 05:45 marostegui: Starting x1 codfw failover from db2215 to db2196 - T397419
  • 05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2196 with weight 0 T397419', diff saved to https://phabricator.wikimedia.org/P78553 and previous config saved to /var/cache/conftool/dbconfig/20250623-054206-root.json
  • 05:41 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Primary switchover x1 T397419
  • 05:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T396130)', diff saved to https://phabricator.wikimedia.org/P78552 and previous config saved to /var/cache/conftool/dbconfig/20250623-053857-marostegui.json
  • 05:38 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 05:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance

2025-06-20

  • 21:14 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 21:14 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 19:19 sukhe: sudo cumin -b11 'A:cp' "run-puppet-agent --enable 'merging CR 1160381'": T390924
  • 19:19 sukhe: sudo cumin -b11 'A:cp' "run-puppet-agent 'merging CR 1160381'": T390924
  • 19:16 sukhe: enabling puppet on cp4037 to merge CR 1160381: add `ismobile=1' for mobile requests: T390924
  • 19:10 sukhe: sudo cumin 'A:cp' "disable-puppet 'merging CR 1160381'": T390924
  • 18:36 bking@cumin2002: conftool action : set/weight=10:pooled=no; selector: name=cirrussearch2113\.codfw\.wmnet
  • 18:34 aokoth@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 18:33 aokoth@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 18:32 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 18:30 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 18:19 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 18:08 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 18:05 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 18:05 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 18:00 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:55 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:55 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:53 gmodena@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:53 gmodena@deploy1003: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:49 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:49 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:47 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 17:47 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2045* slowly with 10 steps - Pooling in slowly
  • 17:35 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:25 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 17:21 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:21 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 17:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aux-k8s-worker1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1008.eqiad.wmnet with OS bookworm
  • 16:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aux-k8s-worker1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:51 dancy@deploy1003: Installation of scap version "4.181.0" completed for 2 hosts
  • 15:49 dancy@deploy1003: Installing scap version "4.181.0" for 2 host(s)
  • 15:49 bking@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:48 bking@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:48 bking@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:47 bking@deploy1003: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:43 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aux-k8s-worker1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:31 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2045* slowly with 10 steps - Pooling in slowly
  • 15:23 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2045.codfw.wmnet
  • 15:23 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for es2045.codfw.wmnet
  • 15:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
  • 15:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm
  • 15:16 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 15:13 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 15:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1006.eqiad.wmnet with reason: host reimage
  • 14:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1009.eqiad.wmnet with reason: host reimage
  • 14:53 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1007.eqiad.wmnet with reason: host reimage
  • 14:53 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1006.eqiad.wmnet with reason: host reimage
  • 14:53 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1009.eqiad.wmnet with reason: host reimage
  • 14:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
  • 14:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 14:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm
  • 14:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aux-k8s-worker1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:25 logmsgbot: dreamyjazz Deployed security patch for T397221
  • 14:25 inflatador: bking@cumin2002:~$ sudo cumin prometheus1007* 'run-puppet-agent' T357146
  • 14:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aux-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aux-k8s-worker1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:16 logmsgbot: dreamyjazz Deployed security patch for T397221
  • 13:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1009.eqiad.wmnet with OS bookworm
  • 13:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
  • 13:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:38 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm
  • 13:36 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs[5004-5005].eqsin.wmnet} and A:liberica
  • 13:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:33 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs[5004-5005].eqsin.wmnet} and A:liberica
  • 13:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aux-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aux-k8s-worker1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:26 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs[4008-4009].ulsfo.wmnet} and A:liberica
  • 13:24 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs[4008-4009].ulsfo.wmnet} and A:liberica
  • 13:23 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs7001.magru.wmnet} and A:liberica
  • 13:23 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:23 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs7001.magru.wmnet} and A:liberica
  • 13:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs7002.magru.wmnet} and A:liberica
  • 13:07 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs7002.magru.wmnet} and A:liberica
  • 13:04 fceratto@cumin1002: dbctl commit (dc=all): 'Pool in API for db1252 - see T385141', diff saved to https://phabricator.wikimedia.org/P78541 and previous config saved to /var/cache/conftool/dbconfig/20250620-130423-fceratto.json
  • 12:59 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs1013.eqiad.wmnet} and A:liberica
  • 12:59 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs1013.eqiad.wmnet} and A:liberica
  • 12:58 vgutierrez: upload liberica 0.21 to apt.wm.o (bookworm-wikimedia)
  • 12:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2240.codfw.wmnet with reason: Maintenance
  • 12:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T396130)', diff saved to https://phabricator.wikimedia.org/P78540 and previous config saved to /var/cache/conftool/dbconfig/20250620-124151-marostegui.json
  • 12:32 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:32 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 12:32 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:31 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 12:31 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:31 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P78539 and previous config saved to /var/cache/conftool/dbconfig/20250620-122644-marostegui.json
  • 12:19 slyngshede@cumin1002: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling restart_daemons on A:netbox-canary
  • 12:18 slyngshede@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P78538 and previous config saved to /var/cache/conftool/dbconfig/20250620-121136-marostegui.json
  • 11:59 jelto: import kubernetes 1.23.14-6 and 1.31.4-5 to apt host - T387548
  • 11:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T396130)', diff saved to https://phabricator.wikimedia.org/P78537 and previous config saved to /var/cache/conftool/dbconfig/20250620-115629-marostegui.json
  • 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T396130)', diff saved to https://phabricator.wikimedia.org/P78536 and previous config saved to /var/cache/conftool/dbconfig/20250620-114941-marostegui.json
  • 11:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T396130)', diff saved to https://phabricator.wikimedia.org/P78535 and previous config saved to /var/cache/conftool/dbconfig/20250620-114917-marostegui.json
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P78534 and previous config saved to /var/cache/conftool/dbconfig/20250620-113410-marostegui.json
  • 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P78533 and previous config saved to /var/cache/conftool/dbconfig/20250620-111901-marostegui.json
  • 11:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T396130)', diff saved to https://phabricator.wikimedia.org/P78532 and previous config saved to /var/cache/conftool/dbconfig/20250620-110354-marostegui.json
  • 10:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2236 (T396130)', diff saved to https://phabricator.wikimedia.org/P78531 and previous config saved to /var/cache/conftool/dbconfig/20250620-105701-marostegui.json
  • 10:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2236.codfw.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T396130)', diff saved to https://phabricator.wikimedia.org/P78530 and previous config saved to /var/cache/conftool/dbconfig/20250620-105638-marostegui.json
  • 10:48 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P78529 and previous config saved to /var/cache/conftool/dbconfig/20250620-104131-marostegui.json
  • 10:29 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 10:29 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 10:28 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 10:27 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 10:26 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P78528 and previous config saved to /var/cache/conftool/dbconfig/20250620-102623-marostegui.json
  • 10:26 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 10:11 jynus@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2184.codfw.wmnet with reason: mariadb upgrade
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T396130)', diff saved to https://phabricator.wikimedia.org/P78527 and previous config saved to /var/cache/conftool/dbconfig/20250620-101116-marostegui.json
  • 10:05 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Bugfixes - oblivian@cumin1003"
  • 10:05 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes - oblivian@cumin1003
  • 10:04 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes - oblivian@cumin1003
  • 10:04 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Bugfixes - oblivian@cumin1003"
  • 10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T396130)', diff saved to https://phabricator.wikimedia.org/P78526 and previous config saved to /var/cache/conftool/dbconfig/20250620-100431-marostegui.json
  • 10:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T396130)', diff saved to https://phabricator.wikimedia.org/P78525 and previous config saved to /var/cache/conftool/dbconfig/20250620-100409-marostegui.json
  • 09:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:56 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 09:56 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 09:55 root@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup2009.codfw.wmnet: Renew puppet certificate - root@cumin1002
  • 09:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P78524 and previous config saved to /var/cache/conftool/dbconfig/20250620-094901-marostegui.json
  • 09:44 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 09:44 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 09:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P78523 and previous config saved to /var/cache/conftool/dbconfig/20250620-093354-marostegui.json
  • 09:24 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2009.codfw.wmnet with reason: Maintenance and reboot
  • 09:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T396130)', diff saved to https://phabricator.wikimedia.org/P78522 and previous config saved to /var/cache/conftool/dbconfig/20250620-091847-marostegui.json
  • 09:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T396130)', diff saved to https://phabricator.wikimedia.org/P78521 and previous config saved to /var/cache/conftool/dbconfig/20250620-091005-marostegui.json
  • 09:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T396130)', diff saved to https://phabricator.wikimedia.org/P78520 and previous config saved to /var/cache/conftool/dbconfig/20250620-090943-marostegui.json
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P78519 and previous config saved to /var/cache/conftool/dbconfig/20250620-085435-marostegui.json
  • 08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P78518 and previous config saved to /var/cache/conftool/dbconfig/20250620-083928-marostegui.json
  • 08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T396130)', diff saved to https://phabricator.wikimedia.org/P78517 and previous config saved to /var/cache/conftool/dbconfig/20250620-082420-marostegui.json
  • 08:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T396130)', diff saved to https://phabricator.wikimedia.org/P78516 and previous config saved to /var/cache/conftool/dbconfig/20250620-081638-marostegui.json
  • 08:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 08:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T396130)', diff saved to https://phabricator.wikimedia.org/P78515 and previous config saved to /var/cache/conftool/dbconfig/20250620-081127-marostegui.json
  • 07:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc2 T378715', diff saved to https://phabricator.wikimedia.org/P78514 and previous config saved to /var/cache/conftool/dbconfig/20250620-075944-root.json
  • 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P78513 and previous config saved to /var/cache/conftool/dbconfig/20250620-075619-marostegui.json
  • 07:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P78512 and previous config saved to /var/cache/conftool/dbconfig/20250620-074112-marostegui.json
  • 07:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T396130)', diff saved to https://phabricator.wikimedia.org/P78511 and previous config saved to /var/cache/conftool/dbconfig/20250620-072605-marostegui.json
  • 07:20 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on es2045.codfw.wmnet with reason: Firmware downgrade pending
  • 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T396130)', diff saved to https://phabricator.wikimedia.org/P78510 and previous config saved to /var/cache/conftool/dbconfig/20250620-071730-marostegui.json
  • 07:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T396130)', diff saved to https://phabricator.wikimedia.org/P78509 and previous config saved to /var/cache/conftool/dbconfig/20250620-071707-marostegui.json
  • 07:12 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2024.codfw.wmnet
  • 07:12 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:12 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2024.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 07:11 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2024.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P78508 and previous config saved to /var/cache/conftool/dbconfig/20250620-070200-marostegui.json
  • 06:56 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 06:51 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts ganeti2024.codfw.wmnet
  • 06:50 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2023.codfw.wmnet
  • 06:50 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:50 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2023.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 06:48 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2023.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P78507 and previous config saved to /var/cache/conftool/dbconfig/20250620-064652-marostegui.json
  • 06:44 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 06:40 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts ganeti2023.codfw.wmnet
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T396130)', diff saved to https://phabricator.wikimedia.org/P78506 and previous config saved to /var/cache/conftool/dbconfig/20250620-063145-marostegui.json
  • 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T396130)', diff saved to https://phabricator.wikimedia.org/P78504 and previous config saved to /var/cache/conftool/dbconfig/20250620-062307-marostegui.json
  • 06:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 06:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T396130)', diff saved to https://phabricator.wikimedia.org/P78503 and previous config saved to /var/cache/conftool/dbconfig/20250620-061659-marostegui.json
  • 06:02 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-staging-etcd2002.codfw.wmnet to plain
  • 06:02 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-staging-etcd2002.codfw.wmnet to plain
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P78502 and previous config saved to /var/cache/conftool/dbconfig/20250620-060151-marostegui.json
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P78501 and previous config saved to /var/cache/conftool/dbconfig/20250620-054644-marostegui.json
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T396130)', diff saved to https://phabricator.wikimedia.org/P78500 and previous config saved to /var/cache/conftool/dbconfig/20250620-053137-marostegui.json
  • 05:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T396130)', diff saved to https://phabricator.wikimedia.org/P78499 and previous config saved to /var/cache/conftool/dbconfig/20250620-052300-marostegui.json
  • 05:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 05:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Upgrade
  • 05:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 05:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on pc1012.eqiad.wmnet with reason: Maintenance

2025-06-19

  • 23:40 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 23:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T396130)', diff saved to https://phabricator.wikimedia.org/P78498 and previous config saved to /var/cache/conftool/dbconfig/20250619-234009-marostegui.json
  • 23:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P78497 and previous config saved to /var/cache/conftool/dbconfig/20250619-232502-marostegui.json
  • 23:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P78496 and previous config saved to /var/cache/conftool/dbconfig/20250619-230955-marostegui.json
  • 22:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T396130)', diff saved to https://phabricator.wikimedia.org/P78495 and previous config saved to /var/cache/conftool/dbconfig/20250619-225448-marostegui.json
  • 22:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1253 (T396130)', diff saved to https://phabricator.wikimedia.org/P78494 and previous config saved to /var/cache/conftool/dbconfig/20250619-224901-marostegui.json
  • 22:48 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1253.eqiad.wmnet with reason: Maintenance
  • 22:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T396130)', diff saved to https://phabricator.wikimedia.org/P78493 and previous config saved to /var/cache/conftool/dbconfig/20250619-224839-marostegui.json
  • 22:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P78492 and previous config saved to /var/cache/conftool/dbconfig/20250619-223332-marostegui.json
  • 22:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P78491 and previous config saved to /var/cache/conftool/dbconfig/20250619-221824-marostegui.json
  • 22:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T396130)', diff saved to https://phabricator.wikimedia.org/P78490 and previous config saved to /var/cache/conftool/dbconfig/20250619-220317-marostegui.json
  • 21:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T396130)', diff saved to https://phabricator.wikimedia.org/P78489 and previous config saved to /var/cache/conftool/dbconfig/20250619-214540-marostegui.json
  • 21:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 21:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T396130)', diff saved to https://phabricator.wikimedia.org/P78488 and previous config saved to /var/cache/conftool/dbconfig/20250619-214517-marostegui.json
  • 21:39 kostajh: mwscript-k8s -f --comment="T397483" -- updateCollation.php --wiki=plwikiquote --previous-collation=uppercase
  • 21:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P78487 and previous config saved to /var/cache/conftool/dbconfig/20250619-213010-marostegui.json
  • 21:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P78486 and previous config saved to /var/cache/conftool/dbconfig/20250619-211502-marostegui.json
  • 20:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T396130)', diff saved to https://phabricator.wikimedia.org/P78485 and previous config saved to /var/cache/conftool/dbconfig/20250619-205955-marostegui.json
  • 20:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T396130)', diff saved to https://phabricator.wikimedia.org/P78484 and previous config saved to /var/cache/conftool/dbconfig/20250619-205505-marostegui.json
  • 20:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 20:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T396130)', diff saved to https://phabricator.wikimedia.org/P78483 and previous config saved to /var/cache/conftool/dbconfig/20250619-205443-marostegui.json
  • 20:40 kostajh: UTC late deploys done
  • 20:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P78482 and previous config saved to /var/cache/conftool/dbconfig/20250619-203935-marostegui.json
  • 20:36 kharlan@deploy1003: Finished scap sync-world: Backport for PageChangeEmissionTest: order move events by kind. (T397087), DomainEvents: Constant repeating notifications (T397103) (duration: 10m 08s)
  • 20:29 kharlan@deploy1003: kharlan, matmarex: Continuing with sync
  • 20:28 kharlan@deploy1003: kharlan, matmarex: Backport for PageChangeEmissionTest: order move events by kind. (T397087), DomainEvents: Constant repeating notifications (T397103) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:26 kharlan@deploy1003: Started scap sync-world: Backport for PageChangeEmissionTest: order move events by kind. (T397087), DomainEvents: Constant repeating notifications (T397103)
  • 20:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P78481 and previous config saved to /var/cache/conftool/dbconfig/20250619-202428-marostegui.json
  • 20:15 kharlan@deploy1003: Finished scap sync-world: Backport for Configure instrument for CheckUser - UserInfoCard (T386440), Set category collation to `uca-pl-u-kn` for plwikiquote (T397466) (duration: 10m 30s)
  • 20:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T396130)', diff saved to https://phabricator.wikimedia.org/P78480 and previous config saved to /var/cache/conftool/dbconfig/20250619-200921-marostegui.json
  • 20:08 kharlan@deploy1003: kharlan, msz2001, mimurawil: Continuing with sync
  • 20:06 kharlan@deploy1003: kharlan, msz2001, mimurawil: Backport for Configure instrument for CheckUser - UserInfoCard (T386440), Set category collation to `uca-pl-u-kn` for plwikiquote (T397466) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T396130)', diff saved to https://phabricator.wikimedia.org/P78479 and previous config saved to /var/cache/conftool/dbconfig/20250619-200432-marostegui.json
  • 20:04 kharlan@deploy1003: Started scap sync-world: Backport for Configure instrument for CheckUser - UserInfoCard (T386440), Set category collation to `uca-pl-u-kn` for plwikiquote (T397466)
  • 20:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 20:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T396130)', diff saved to https://phabricator.wikimedia.org/P78478 and previous config saved to /var/cache/conftool/dbconfig/20250619-200409-marostegui.json
  • 19:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P78477 and previous config saved to /var/cache/conftool/dbconfig/20250619-194902-marostegui.json
  • 19:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 19:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 19:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 19:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 19:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P78476 and previous config saved to /var/cache/conftool/dbconfig/20250619-193355-marostegui.json
  • 19:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 19:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 19:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T396130)', diff saved to https://phabricator.wikimedia.org/P78475 and previous config saved to /var/cache/conftool/dbconfig/20250619-191848-marostegui.json
  • 19:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T396130)', diff saved to https://phabricator.wikimedia.org/P78474 and previous config saved to /var/cache/conftool/dbconfig/20250619-191401-marostegui.json
  • 19:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 19:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T396130)', diff saved to https://phabricator.wikimedia.org/P78473 and previous config saved to /var/cache/conftool/dbconfig/20250619-191339-marostegui.json
  • 18:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P78472 and previous config saved to /var/cache/conftool/dbconfig/20250619-185832-marostegui.json
  • 18:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P78471 and previous config saved to /var/cache/conftool/dbconfig/20250619-184325-marostegui.json
  • 18:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T396130)', diff saved to https://phabricator.wikimedia.org/P78470 and previous config saved to /var/cache/conftool/dbconfig/20250619-182817-marostegui.json
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T396130)', diff saved to https://phabricator.wikimedia.org/P78469 and previous config saved to /var/cache/conftool/dbconfig/20250619-182320-marostegui.json
  • 18:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T396130)', diff saved to https://phabricator.wikimedia.org/P78468 and previous config saved to /var/cache/conftool/dbconfig/20250619-182258-marostegui.json
  • 18:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P78467 and previous config saved to /var/cache/conftool/dbconfig/20250619-180751-marostegui.json
  • 17:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P78466 and previous config saved to /var/cache/conftool/dbconfig/20250619-175244-marostegui.json
  • 17:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T396130)', diff saved to https://phabricator.wikimedia.org/P78465 and previous config saved to /var/cache/conftool/dbconfig/20250619-173737-marostegui.json
  • 17:32 sukhe: forcing agent run on A:liberica-drmrs to merge CR 1161576
  • 17:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T396130)', diff saved to https://phabricator.wikimedia.org/P78464 and previous config saved to /var/cache/conftool/dbconfig/20250619-173142-marostegui.json
  • 17:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 17:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T396130)', diff saved to https://phabricator.wikimedia.org/P78462 and previous config saved to /var/cache/conftool/dbconfig/20250619-171201-marostegui.json
  • 17:10 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 17:10 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 17:09 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 17:09 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 17:08 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 17:08 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 17:03 logmsgbot: dreamyjazz Deployed security patch for T397088
  • 16:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P78461 and previous config saved to /var/cache/conftool/dbconfig/20250619-165653-marostegui.json
  • 16:52 logmsgbot: dreamyjazz Deployed security patch for T397196
  • 16:43 logmsgbot: dreamyjazz Deployed security patch for T397196
  • 16:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P78460 and previous config saved to /var/cache/conftool/dbconfig/20250619-164146-marostegui.json
  • 16:33 logmsgbot: dreamyjazz Deployed security patch for T396750
  • 16:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T396130)', diff saved to https://phabricator.wikimedia.org/P78459 and previous config saved to /var/cache/conftool/dbconfig/20250619-162639-marostegui.json
  • 16:24 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum7003.magru.wmnet with OS bookworm
  • 16:24 logmsgbot: dreamyjazz Deployed security patch for T396750
  • 16:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
  • 16:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T396130)', diff saved to https://phabricator.wikimedia.org/P78458 and previous config saved to /var/cache/conftool/dbconfig/20250619-160451-marostegui.json
  • 16:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T396130)', diff saved to https://phabricator.wikimedia.org/P78457 and previous config saved to /var/cache/conftool/dbconfig/20250619-160429-marostegui.json
  • 16:04 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
  • 15:56 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:53 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 15:53 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc2012
  • 15:53 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc2012
  • 15:50 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc2012
  • 15:50 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc2012
  • 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P78456 and previous config saved to /var/cache/conftool/dbconfig/20250619-154921-marostegui.json
  • 15:48 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:46 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 15:36 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3004.esams.wmnet with OS bookworm
  • 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P78455 and previous config saved to /var/cache/conftool/dbconfig/20250619-153414-marostegui.json
  • 15:31 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 15:31 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 15:30 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 15:30 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum7003.magru.wmnet with OS bookworm
  • 15:30 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 15:27 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 15:26 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 15:26 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:26 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:20 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T396130)', diff saved to https://phabricator.wikimedia.org/P78454 and previous config saved to /var/cache/conftool/dbconfig/20250619-151907-marostegui.json
  • 15:17 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3004.esams.wmnet with reason: host reimage
  • 15:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T396130)', diff saved to https://phabricator.wikimedia.org/P78453 and previous config saved to /var/cache/conftool/dbconfig/20250619-151402-marostegui.json
  • 15:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:13 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3004.esams.wmnet with reason: host reimage
  • 15:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:56 jmm@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2021.codfw.wmnet with reason: remove for decom
  • 14:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 14:54 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2021.codfw.wmnet
  • 14:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T396130)', diff saved to https://phabricator.wikimedia.org/P78452 and previous config saved to /var/cache/conftool/dbconfig/20250619-144855-marostegui.json
  • 14:48 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum3004.esams.wmnet with OS bookworm
  • 14:47 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-staging-etcd2001.codfw.wmnet to plain
  • 14:46 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-staging-etcd2001.codfw.wmnet to plain
  • 14:46 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3003.esams.wmnet with OS bookworm
  • 14:46 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-staging-etcd2003.codfw.wmnet to plain
  • 14:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:45 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-staging-etcd2003.codfw.wmnet to plain
  • 14:44 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:43 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
  • 14:41 moritzm: removing ganeti2021 from codfw cluster for decom T396590
  • 14:38 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
  • 14:36 moritzm: installing twitter-bootstrap3 security updates
  • 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P78451 and previous config saved to /var/cache/conftool/dbconfig/20250619-143348-marostegui.json
  • 14:26 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
  • 14:23 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
  • 14:23 vgutierrez: repool lvs4008 (text) using katran - T396561
  • 14:22 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs4008.ulsfo.wmnet} and A:liberica (T396561)
  • 14:22 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs4008.ulsfo.wmnet} and A:liberica (T396561)
  • 14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P78450 and previous config saved to /var/cache/conftool/dbconfig/20250619-141841-marostegui.json
  • 14:09 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp7002*} and A:cp - 9.2.11 upgrade (T397456)
  • 14:04 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp7002*} and A:cp - 9.2.11 upgrade (T397456)
  • 14:03 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4008.ulsfo.wmnet
  • 14:03 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs4008.ulsfo.wmnet
  • 14:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T396130)', diff saved to https://phabricator.wikimedia.org/P78449 and previous config saved to /var/cache/conftool/dbconfig/20250619-140334-marostegui.json
  • 14:01 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp4037*} and A:cp - 9.2.11 upgrade (T397456)
  • 13:59 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum3003.esams.wmnet with OS bookworm
  • 13:57 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp4037*} and A:cp - 9.2.11 upgrade (T397456)
  • 13:56 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:56 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.11-1wm1_amd64.changes: T397456
  • 13:54 jmm@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2021.codfw.wmnet with reason: remove for decom
  • 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2222 (T396130)', diff saved to https://phabricator.wikimedia.org/P78448 and previous config saved to /var/cache/conftool/dbconfig/20250619-134548-marostegui.json
  • 13:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2222.codfw.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T396130)', diff saved to https://phabricator.wikimedia.org/P78447 and previous config saved to /var/cache/conftool/dbconfig/20250619-134525-marostegui.json
  • 13:42 akosiaris@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs4008.ulsfo.wmnet} and A:liberica (T396561)
  • 13:38 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs4008.ulsfo.wmnet} and A:liberica (T396561)
  • 13:38 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs4008.ulsfo.wmnet with reason: switching to katran
  • 13:34 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Enable ScopedTypeaheadSearch on Wikidata (T394670) (duration: 13m 20s)
  • 13:32 akosiaris@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:31 vgutierrez: repool lvs4009 (upload) using katran - T396561
  • 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P78446 and previous config saved to /var/cache/conftool/dbconfig/20250619-133018-marostegui.json
  • 13:27 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
  • 13:26 akosiaris@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:26 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs4009.ulsfo.wmnet} and A:liberica (T396561)
  • 13:26 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
  • 13:26 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs4009.ulsfo.wmnet} and A:liberica (T396561)
  • 13:23 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Enable ScopedTypeaheadSearch on Wikidata (T394670) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:22 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
  • 13:21 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Enable ScopedTypeaheadSearch on Wikidata (T394670)
  • 13:18 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4009.ulsfo.wmnet
  • 13:18 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs4009.ulsfo.wmnet
  • 13:16 akosiaris@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P78445 and previous config saved to /var/cache/conftool/dbconfig/20250619-131510-marostegui.json
  • 13:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T396130)', diff saved to https://phabricator.wikimedia.org/P78443 and previous config saved to /var/cache/conftool/dbconfig/20250619-130003-marostegui.json
  • 12:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 12:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts htmldumper1001.eqiad.wmnet
  • 12:44 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:44 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: htmldumper1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 12:44 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: htmldumper1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 12:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2221 (T396130)', diff saved to https://phabricator.wikimedia.org/P78442 and previous config saved to /var/cache/conftool/dbconfig/20250619-124210-marostegui.json
  • 12:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2221.codfw.wmnet with reason: Maintenance
  • 12:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2186,2196].codfw.wmnet with reason: Maintenance
  • 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T396130)', diff saved to https://phabricator.wikimedia.org/P78441 and previous config saved to /var/cache/conftool/dbconfig/20250619-124148-marostegui.json
  • 12:39 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 12:34 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts htmldumper1001.eqiad.wmnet
  • 12:31 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:31 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 12:30 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:30 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 12:29 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 12:29 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:29 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 12:29 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P78440 and previous config saved to /var/cache/conftool/dbconfig/20250619-122640-marostegui.json
  • 12:23 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:23 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 12:22 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 12:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 12:22 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:21 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 12:21 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:21 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 12:19 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:19 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:17 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:16 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:14 hnowlan@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:12 hnowlan@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:12 hnowlan@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:11 hnowlan@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P78438 and previous config saved to /var/cache/conftool/dbconfig/20250619-121133-marostegui.json
  • 12:09 jmm@dns1004: END - running authdns-update
  • 12:08 jmm@dns1004: START - running authdns-update
  • 12:01 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in codfw/codfw: maintenance
  • 11:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T396130)', diff saved to https://phabricator.wikimedia.org/P78437 and previous config saved to /var/cache/conftool/dbconfig/20250619-115626-marostegui.json
  • 11:46 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-cluster pool all services in codfw/codfw: maintenance
  • 11:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:42 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T396130)', diff saved to https://phabricator.wikimedia.org/P78436 and previous config saved to /var/cache/conftool/dbconfig/20250619-113931-marostegui.json
  • 11:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T396130)', diff saved to https://phabricator.wikimedia.org/P78435 and previous config saved to /var/cache/conftool/dbconfig/20250619-113908-marostegui.json
  • 11:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P78434 and previous config saved to /var/cache/conftool/dbconfig/20250619-112401-marostegui.json
  • 11:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P78433 and previous config saved to /var/cache/conftool/dbconfig/20250619-110854-marostegui.json
  • 11:02 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker2002.codfw.wmnet with OS bookworm
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T396130)', diff saved to https://phabricator.wikimedia.org/P78432 and previous config saved to /var/cache/conftool/dbconfig/20250619-105347-marostegui.json
  • 10:50 slyngshede@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300
  • 10:47 slyngshede@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300
  • 10:46 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker2002.codfw.wmnet with reason: host reimage
  • 10:46 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) depool all services in codfw/codfw: maintenance
  • 10:42 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker2002.codfw.wmnet with reason: host reimage
  • 10:42 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 10:39 moritzm: installing Django security updates
  • 10:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T396130)', diff saved to https://phabricator.wikimedia.org/P78431 and previous config saved to /var/cache/conftool/dbconfig/20250619-103400-marostegui.json
  • 10:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 10:32 akosiaris@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2006.codfw.wmnet with OS bookworm
  • 10:32 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 10:32 moritzm: installing twisted security updates
  • 10:31 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-cluster depool all services in codfw/codfw: maintenance
  • 10:31 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:31 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:28 moritzm: installing postgresql-13 security updates
  • 10:25 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host dse-k8s-worker2002
  • 10:25 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker2002
  • 10:25 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker2002
  • 10:25 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker2002.codfw.wmnet 86.48.192.10.in-addr.arpa 6.8.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:25 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-worker2002.codfw.wmnet 86.48.192.10.in-addr.arpa 6.8.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:25 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:25 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host dse-k8s-worker2002 - btullis@cumin1003"
  • 10:25 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host dse-k8s-worker2002 - btullis@cumin1003"
  • 10:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 10:23 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 10:23 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:23 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix api token loading - oblivian@cumin1003"
  • 10:23 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix api token loading - oblivian@cumin1003
  • 10:23 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix api token loading - oblivian@cumin1003
  • 10:23 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix api token loading - oblivian@cumin1003"
  • 10:23 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:20 Amir1: dropping searchindex table in itwiki (T397367)
  • 10:19 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:19 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host dse-k8s-worker2002
  • 10:19 Emperor: depool / restart / repool ms-fe1009 [some idle timeouts]
  • 10:19 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2002.codfw.wmnet with OS bookworm
  • 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker2001.codfw.wmnet with OS bookworm
  • 10:18 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 10:17 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:16 akosiaris@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aux-k8s-worker2006.codfw.wmnet with reason: host reimage
  • 10:16 akosiaris@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2006.codfw.wmnet with reason: host reimage
  • 10:14 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 10:14 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 10:14 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 10:14 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78430 and previous config saved to /var/cache/conftool/dbconfig/20250619-101317-root.json
  • 10:12 godog: powercycle netmon1003
  • 10:12 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:12 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:04 akosiaris@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2006.codfw.wmnet with OS bookworm
  • 10:02 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker2001.codfw.wmnet with reason: host reimage
  • 10:01 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T396130)', diff saved to https://phabricator.wikimedia.org/P78429 and previous config saved to /var/cache/conftool/dbconfig/20250619-100102-marostegui.json
  • 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78428 and previous config saved to /var/cache/conftool/dbconfig/20250619-095811-root.json
  • 09:57 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker2001.codfw.wmnet with reason: host reimage
  • 09:54 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Homer release to add wikikube-worker-exp - cmooney@cumin1003
  • 09:52 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Homer release to add wikikube-worker-exp - cmooney@cumin1003
  • 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P78427 and previous config saved to /var/cache/conftool/dbconfig/20250619-094554-marostegui.json
  • 09:45 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "New api endpoints for the requestctl client - oblivian@cumin1003"
  • 09:45 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: New api endpoints for the requestctl client - oblivian@cumin1003
  • 09:44 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: New api endpoints for the requestctl client - oblivian@cumin1003
  • 09:44 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "New api endpoints for the requestctl client - oblivian@cumin1003"
  • 09:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78426 and previous config saved to /var/cache/conftool/dbconfig/20250619-094306-root.json
  • 09:40 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host dse-k8s-worker2001
  • 09:40 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker2001
  • 09:40 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker2001
  • 09:40 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker2001.codfw.wmnet 126.32.192.10.in-addr.arpa 6.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:39 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-worker2001.codfw.wmnet 126.32.192.10.in-addr.arpa 6.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:39 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:39 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host dse-k8s-worker2001 - btullis@cumin1003"
  • 09:39 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host dse-k8s-worker2001 - btullis@cumin1003"
  • 09:36 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 09:36 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:33 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:31 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P78425 and previous config saved to /var/cache/conftool/dbconfig/20250619-093047-marostegui.json
  • 09:30 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host dse-k8s-worker2001
  • 09:30 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2001.codfw.wmnet with OS bookworm
  • 09:28 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78424 and previous config saved to /var/cache/conftool/dbconfig/20250619-092801-root.json
  • 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:25 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:22 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:19 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:19 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:17 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kafka-stretch2002 to dse-k8s-worker2002
  • 09:16 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker2002
  • 09:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2196.codfw.wmnet with reason: Maintenance
  • 09:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T396130)', diff saved to https://phabricator.wikimedia.org/P78423 and previous config saved to /var/cache/conftool/dbconfig/20250619-091539-marostegui.json
  • 09:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2196', diff saved to https://phabricator.wikimedia.org/P78422 and previous config saved to /var/cache/conftool/dbconfig/20250619-091532-root.json
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
  • 09:14 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker2002
  • 09:14 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker2002 on all recursors
  • 09:14 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-worker2002 on all recursors
  • 09:14 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:14 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kafka-stretch2002 to dse-k8s-worker2002 - btullis@cumin1003"
  • 09:10 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kafka-stretch2002 to dse-k8s-worker2002 - btullis@cumin1003"
  • 09:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
  • 09:02 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:01 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:01 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:01 urbanecm@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:00 btullis@cumin1003: START - Cookbook sre.hosts.rename from kafka-stretch2002 to dse-k8s-worker2002
  • 08:59 urbanecm@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kafka-stretch2001 to dse-k8s-worker2001
  • 08:59 urbanecm@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 08:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 T397164
  • 08:58 urbanecm@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:58 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker2001
  • 08:58 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker2001
  • 08:58 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker2001 on all recursors
  • 08:58 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-worker2001 on all recursors
  • 08:58 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:58 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kafka-stretch2001 to dse-k8s-worker2001 - btullis@cumin1003"
  • 08:56 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kafka-stretch2001 to dse-k8s-worker2001 - btullis@cumin1003"
  • 08:56 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
  • 08:55 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
  • 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T396130)', diff saved to https://phabricator.wikimedia.org/P78421 and previous config saved to /var/cache/conftool/dbconfig/20250619-085357-marostegui.json
  • 08:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78420 and previous config saved to /var/cache/conftool/dbconfig/20250619-085344-root.json
  • 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T396130)', diff saved to https://phabricator.wikimedia.org/P78419 and previous config saved to /var/cache/conftool/dbconfig/20250619-085334-marostegui.json
  • 08:52 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 08:52 btullis@cumin1003: START - Cookbook sre.hosts.rename from kafka-stretch2001 to dse-k8s-worker2001
  • 08:50 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
  • 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db[2160,2235].codfw.wmnet with reason: Maintenance
  • 08:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2235.codfw.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78418 and previous config saved to /var/cache/conftool/dbconfig/20250619-083832-root.json
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P78417 and previous config saved to /var/cache/conftool/dbconfig/20250619-083827-marostegui.json
  • 08:33 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thanos-be[1001-1004].eqiad.wmnet
  • 08:33 mvernon@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:33 mvernon@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-be[1001-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1003"
  • 08:31 moritzm: installing modsecurity-apache security updates
  • 08:31 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2048.codfw.wmnet to cluster codfw and group B
  • 08:29 mvernon@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-be[1001-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1003"
  • 08:28 akosiaris@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:25 mvernon@cumin1003: START - Cookbook sre.dns.netbox
  • 08:23 akosiaris@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78415 and previous config saved to /var/cache/conftool/dbconfig/20250619-082326-root.json
  • 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P78414 and previous config saved to /var/cache/conftool/dbconfig/20250619-082320-marostegui.json
  • 08:17 Ammar: Ran fixStuckGlobalRename.php for T397384 T397219 T397218
  • 08:12 hashar@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.6 refs T392176
  • 08:10 mvernon@cumin1003: START - Cookbook sre.hosts.decommission for hosts thanos-be[1001-1004].eqiad.wmnet
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78413 and previous config saved to /var/cache/conftool/dbconfig/20250619-080820-root.json
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T396130)', diff saved to https://phabricator.wikimedia.org/P78412 and previous config saved to /var/cache/conftool/dbconfig/20250619-080812-marostegui.json
  • 08:07 moritzm: installing python-tornado security updates
  • 07:56 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P78411 and previous config saved to /var/cache/conftool/dbconfig/20250619-075548-root.json
  • 07:50 moritzm: installing glib2.0 security updates
  • 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T396130)', diff saved to https://phabricator.wikimedia.org/P78410 and previous config saved to /var/cache/conftool/dbconfig/20250619-074731-marostegui.json
  • 07:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T396130)', diff saved to https://phabricator.wikimedia.org/P78409 and previous config saved to /var/cache/conftool/dbconfig/20250619-074708-marostegui.json
  • 07:41 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[1164,1217].eqiad.wmnet with reason: Maintenance
  • 07:41 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti2048.codfw.wmnet to cluster codfw and group B
  • 07:41 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2047.codfw.wmnet to cluster codfw and group B
  • 07:39 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti2047.codfw.wmnet to cluster codfw and group B
  • 07:37 jynus: just started es read only backup regeneration T387892
  • 07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2048.codfw.wmnet
  • 07:33 marostegui: Failover m2 from db1164 to db1250 - T397182
  • 07:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P78407 and previous config saved to /var/cache/conftool/dbconfig/20250619-073201-marostegui.json
  • 07:31 kartik@deploy1003: Finished scap sync-world: Backport for Enable the Contribute menu in Egyptian Arabic, Igbo, and Uzbek (duration: 10m 59s)
  • 07:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2048.codfw.wmnet
  • 07:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300
  • 07:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2047.codfw.wmnet
  • 07:25 slyngshede@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300
  • 07:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2233].codfw.wmnet,db[1164,1217,1250].eqiad.wmnet with reason: Primary switchover m2 T397182
  • 07:24 kartik@deploy1003: kartik: Continuing with sync
  • 07:22 kartik@deploy1003: kartik: Backport for Enable the Contribute menu in Egyptian Arabic, Igbo, and Uzbek synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:20 kartik@deploy1003: Started scap sync-world: Backport for Enable the Contribute menu in Egyptian Arabic, Igbo, and Uzbek
  • 07:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2047.codfw.wmnet
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P78405 and previous config saved to /var/cache/conftool/dbconfig/20250619-071654-marostegui.json
  • 07:15 gkyziridis@deploy1003: Finished scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for azwiki (T395824) (duration: 11m 50s)
  • 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync
  • 07:06 gkyziridis@deploy1003: gkyziridis: Backport for ores-extension: enable extension with revertrisk filter for azwiki (T395824) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:04 moritzm: installing edk2 security updates
  • 07:04 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for azwiki (T395824)
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T396130)', diff saved to https://phabricator.wikimedia.org/P78404 and previous config saved to /var/cache/conftool/dbconfig/20250619-070146-marostegui.json
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T396130)', diff saved to https://phabricator.wikimedia.org/P78403 and previous config saved to /var/cache/conftool/dbconfig/20250619-064108-marostegui.json
  • 06:41 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T396130)', diff saved to https://phabricator.wikimedia.org/P78402 and previous config saved to /var/cache/conftool/dbconfig/20250619-064045-marostegui.json
  • 06:39 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1176.eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 9
  • 06:39 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1154.eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 9
  • 06:38 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1175.eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 10
  • 06:37 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1149-1153].eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 10
  • 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2186 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78401 and previous config saved to /var/cache/conftool/dbconfig/20250619-062936-root.json
  • 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P78400 and previous config saved to /var/cache/conftool/dbconfig/20250619-062537-marostegui.json
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2186 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78399 and previous config saved to /var/cache/conftool/dbconfig/20250619-061430-root.json
  • 06:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P78398 and previous config saved to /var/cache/conftool/dbconfig/20250619-061030-marostegui.json
  • 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2186 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78397 and previous config saved to /var/cache/conftool/dbconfig/20250619-055924-root.json
  • 05:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T396130)', diff saved to https://phabricator.wikimedia.org/P78396 and previous config saved to /var/cache/conftool/dbconfig/20250619-055522-marostegui.json
  • 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2186 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78395 and previous config saved to /var/cache/conftool/dbconfig/20250619-054418-root.json
  • 05:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2186', diff saved to https://phabricator.wikimedia.org/P78394 and previous config saved to /var/cache/conftool/dbconfig/20250619-053826-root.json
  • 05:38 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 05:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T396130)', diff saved to https://phabricator.wikimedia.org/P78393 and previous config saved to /var/cache/conftool/dbconfig/20250619-053433-marostegui.json
  • 05:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 05:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Maintenance
  • 05:08 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1012.eqiad.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc2 T378715', diff saved to https://phabricator.wikimedia.org/P78392 and previous config saved to /var/cache/conftool/dbconfig/20250619-050725-root.json
  • 05:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 01:55 ejegg: re-enabled recurring donation charge job
  • 01:46 eileen: civicrm upgraded from e22e242f to 3288293e
  • 01:12 eileen: civicrm upgraded from c6225a10 to e22e242f
  • 00:39 eileen: config revision changed from 3521b9fe to a5546244
  • 00:21 ejegg: civicrm upgraded from 670b3f6b to c6225a10

2025-06-18

  • 22:57 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker1009
  • 22:56 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker1009
  • 22:56 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker1008
  • 22:55 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker1008
  • 22:55 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker1007
  • 22:54 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker1007
  • 22:54 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker1006
  • 22:52 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker1006
  • 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for aux-k8s-worker100[6-9] - jclark@cumin1002"
  • 22:47 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for aux-k8s-worker100[6-9] - jclark@cumin1002"
  • 22:44 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 22:19 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 22:19 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 22:14 brennen@deploy1003: Finished deploy [phabricator/deployment@6af4bb7]: merge-phorge-2024.35 deploy to phab1005 (T390034) (duration: 00m 26s)
  • 22:14 brennen@deploy1003: Started deploy [phabricator/deployment@6af4bb7]: merge-phorge-2024.35 deploy to phab1005 (T390034)
  • 22:12 brennen@deploy1003: Finished deploy [phabricator/deployment@f8d7b38]: no-op deploy to phab1005 (duration: 00m 07s)
  • 22:12 brennen@deploy1003: Started deploy [phabricator/deployment@f8d7b38]: no-op deploy to phab1005
  • 22:09 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-codfw and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 22:05 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3072.*
  • 22:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp3072.esams.wmnet
  • 22:04 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp3072.esams.wmnet
  • 22:01 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:59 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:41 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp3072.esams.wmnet with reason: BIOS upgrades
  • 21:40 brett: Depooling cp3072 to upgrade bios
  • 21:39 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp3072.*
  • 21:39 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3072.*
  • 21:36 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.*
  • 21:31 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 21:30 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 21:23 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2005.codfw.wmnet with OS bullseye
  • 21:22 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:22 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:22 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:07 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:07 dancy@deploy1003: Installation of scap version "4.180.0" completed for 2 hosts
  • 21:06 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:06 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:05 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:05 dancy@deploy1003: Installing scap version "4.180.0" for 2 host(s)
  • 21:04 ebernhardson@deploy1003: Finished scap sync-world: Backport for Use discovery dns for elasticsearch read traffic (T143553) (duration: 10m 14s)
  • 21:03 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 20:59 ebernhardson: updateCollation.php for azwikibooks, azwikiquote, azwikisource, and azwiktionary completed
  • 20:59 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:59 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:59 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:57 ebernhardson@deploy1003: ebernhardson: Continuing with sync
  • 20:56 ebernhardson@deploy1003: ebernhardson: Backport for Use discovery dns for elasticsearch read traffic (T143553) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:54 ebernhardson@deploy1003: Started scap sync-world: Backport for Use discovery dns for elasticsearch read traffic (T143553)
  • 20:52 ebernhardson: running updateCollation.php for azwikibooks, azwikiquote, azwikisource, and azwiktionary
  • 20:50 ebernhardson@deploy1003: Finished scap sync-world: Backport for Set category collation to "uca-az" for Azerbaijani projects (T395896) (duration: 11m 06s)
  • 20:46 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:46 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:45 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:44 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7001.*
  • 20:44 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.10.1 - volans@cumin2002
  • 20:44 ebernhardson@deploy1003: nmw03, ebernhardson: Continuing with sync
  • 20:43 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.10.1 - volans@cumin2002
  • 20:42 ebernhardson@deploy1003: nmw03, ebernhardson: Backport for Set category collation to "uca-az" for Azerbaijani projects (T395896) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:39 ebernhardson@deploy1003: Started scap sync-world: Backport for Set category collation to "uca-az" for Azerbaijani projects (T395896)
  • 20:38 ebernhardson@deploy1003: Finished scap sync-world: Backport for cirrus: Add services for read operations (T143553) (duration: 11m 11s)
  • 20:37 hashar: gerrit: deleted bunch of obsoletes references under `refs/users/*` accross all repositories. See T397317 (private)
  • 20:31 ebernhardson@deploy1003: ebernhardson: Continuing with sync
  • 20:31 cdobbins@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on A:cp-codfw and A:cp - 9.2.10 upgrade (T390912)
  • 20:29 ebernhardson@deploy1003: ebernhardson: Backport for cirrus: Add services for read operations (T143553) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:26 ebernhardson@deploy1003: Started scap sync-world: Backport for cirrus: Add services for read operations (T143553)
  • 20:20 dancy@deploy1003: Installation of scap version "4.179.1" completed for 2 hosts
  • 20:18 dancy@deploy1003: Installing scap version "4.179.1" for 2 host(s)
  • 20:15 ebernhardson@deploy1003: Finished scap sync-world: Backport for Revert "Enable new mobile search experience everywhere (not including empty search recommendations)" (duration: 10m 45s)
  • 20:13 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2008.codfw.wmnet with OS bookworm
  • 20:13 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 20:12 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 20:11 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2007.codfw.wmnet with OS bookworm
  • 20:11 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 20:10 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 20:08 ebernhardson@deploy1003: ebernhardson, ksarabia: Continuing with sync
  • 20:07 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2009.codfw.wmnet with OS bookworm
  • 20:07 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 20:06 ebernhardson@deploy1003: ebernhardson, ksarabia: Backport for Revert "Enable new mobile search experience everywhere (not including empty search recommendations)" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:06 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 20:04 ebernhardson@deploy1003: Started scap sync-world: Backport for Revert "Enable new mobile search experience everywhere (not including empty search recommendations)"
  • 20:03 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2006.codfw.wmnet with OS bookworm
  • 20:03 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 20:03 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2007-dev
  • 20:03 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2007-dev
  • 20:03 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2006-dev
  • 20:03 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 20:03 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2006-dev
  • 20:03 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2005-dev
  • 20:02 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2005-dev
  • 20:02 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:59 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 19:56 dancy@deploy1003: Installation of scap version "4.179.0" completed for 2 hosts
  • 19:55 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aux-k8s-worker2008.codfw.wmnet with reason: host reimage
  • 19:54 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2007.codfw.wmnet with reason: host reimage
  • 19:54 dancy@deploy1003: Installing scap version "4.179.0" for 2 host(s)
  • 19:50 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2009.codfw.wmnet with reason: host reimage
  • 19:47 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2006.codfw.wmnet with reason: host reimage
  • 19:44 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2009.codfw.wmnet with reason: host reimage
  • 19:44 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2008.codfw.wmnet with reason: host reimage
  • 19:43 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2007.codfw.wmnet with reason: host reimage
  • 19:43 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2006.codfw.wmnet with reason: host reimage
  • 19:32 ryankemper: T393966 Ran puppet on `titan1001` following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1155335. Puppet looks happy and I see the new recording rules getting created
  • 19:31 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2009.codfw.wmnet with OS bookworm
  • 19:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 19:31 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2008.codfw.wmnet with OS bookworm
  • 19:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T396130)', diff saved to https://phabricator.wikimedia.org/P78387 and previous config saved to /var/cache/conftool/dbconfig/20250618-193101-marostegui.json
  • 19:31 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2007.codfw.wmnet with OS bookworm
  • 19:30 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2006.codfw.wmnet with OS bookworm
  • 19:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P78386 and previous config saved to /var/cache/conftool/dbconfig/20250618-191553-marostegui.json
  • 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Testing T395696', diff saved to https://phabricator.wikimedia.org/P78385 and previous config saved to /var/cache/conftool/dbconfig/20250618-191440-ladsgroup.json
  • 19:10 ladsgroup@deploy1003: Finished scap sync-world: Backport for etcd: Check for array key (T395696) (duration: 12m 39s)
  • 19:07 ejegg: civicrm upgraded from 63302c18 to 670b3f6b
  • 19:05 cdobbins@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on A:cp-codfw and A:cp - 9.2.10 upgrade (T390912)
  • 19:05 ChrisDobbins901_: cdobbins@cumin2002:~$ sudo -i cookbook sre.cdn.roll-upgrade-ats --query 'A:cp-codfw' --task-id T390912 --reason '9.2.10 upgrade'
  • 19:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P78384 and previous config saved to /var/cache/conftool/dbconfig/20250618-190045-marostegui.json
  • 19:00 wfan: payments-wiki upgraded from aa102260 to f56db8e6
  • 18:59 ladsgroup@deploy1003: ladsgroup: Backport for etcd: Check for array key (T395696) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:57 ladsgroup@deploy1003: Started scap sync-world: Backport for etcd: Check for array key (T395696)
  • 18:56 ryankemper@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on 6 hosts with reason: T395772 hosts not serving production traffic
  • 18:55 ladsgroup@deploy1003: Finished scap sync-world: Backport for etcd: Remove ES clusters from "write clusters" if section is RO (T395696) (duration: 26m 55s)
  • 18:49 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 18:48 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on A:cp-eqiad and A:cp - 9.2.10 upgrade (T390912)
  • 18:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T396130)', diff saved to https://phabricator.wikimedia.org/P78383 and previous config saved to /var/cache/conftool/dbconfig/20250618-184538-marostegui.json
  • 18:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Testing T395696', diff saved to https://phabricator.wikimedia.org/P78382 and previous config saved to /var/cache/conftool/dbconfig/20250618-184325-ladsgroup.json
  • 18:38 cdobbins@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on A:cp-eqsin and A:cp - 9.2.10 upgrade (T390912)
  • 18:31 ladsgroup@deploy1003: ladsgroup: Backport for etcd: Remove ES clusters from "write clusters" if section is RO (T395696) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:29 ladsgroup@deploy1003: Started scap sync-world: Backport for etcd: Remove ES clusters from "write clusters" if section is RO (T395696)
  • 18:28 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:27 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:27 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:26 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:26 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:26 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:26 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:26 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:25 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:24 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:23 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1251 (T396130)', diff saved to https://phabricator.wikimedia.org/P78381 and previous config saved to /var/cache/conftool/dbconfig/20250618-182313-marostegui.json
  • 18:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1251.eqiad.wmnet with reason: Maintenance
  • 18:22 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:20 jgleeson: civicrm rolled back from 10eac2f8 to 63302c18
  • 18:18 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:17 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:17 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:17 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:17 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:16 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:16 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:15 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:14 ladsgroup@deploy1003: Finished deploy [performance/arc-lamp@76afb89]: Deploy arclamp (duration: 00m 08s)
  • 18:14 ladsgroup@deploy1003: Started deploy [performance/arc-lamp@76afb89]: Deploy arclamp
  • 18:13 ladsgroup@deploy1003: sync-world aborted: Deploy arclamp (duration: 00m 33s)
  • 18:13 ladsgroup@deploy1003: Started scap sync-world: Deploy arclamp
  • 18:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 18:04 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:03 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:03 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:03 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:59 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1048.eqiad.wmnet
  • 17:58 swfrench-wmf: migrated all shellbox instances to bookworm-based httpd images in eqiad - T378128
  • 17:58 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 17:57 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 17:57 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 17:56 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 17:56 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:56 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:55 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 17:55 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 17:54 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 17:54 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 17:54 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 17:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Running queries (T385167)
  • 17:53 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 17:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db2155 for queries (T385167)', diff saved to https://phabricator.wikimedia.org/P78379 and previous config saved to /var/cache/conftool/dbconfig/20250618-175206-ladsgroup.json
  • 17:50 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2011.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:50 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:50 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host rdb2011.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:49 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host rdb2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:49 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:48 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2011.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 17:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T396130)', diff saved to https://phabricator.wikimedia.org/P78378 and previous config saved to /var/cache/conftool/dbconfig/20250618-174632-marostegui.json
  • 17:43 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host rdb2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:43 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host rdb2011.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:40 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host rdb2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:39 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host rdb2011.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:39 swfrench-wmf: migrated all shellbox instances to bookworm-based httpd images in codfw - T378128
  • 17:37 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 17:37 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 17:36 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 17:36 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 17:35 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:35 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:35 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.*
  • 17:35 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 17:34 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 17:34 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 17:33 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 17:33 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 17:32 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 17:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P78377 and previous config saved to /var/cache/conftool/dbconfig/20250618-173124-marostegui.json
  • 17:29 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 17:29 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 17:29 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 17:28 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 17:28 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:28 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:28 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 17:28 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 17:28 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 17:28 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on A:cp-eqiad and A:cp - 9.2.10 upgrade (T390912)
  • 17:28 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 17:28 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 17:27 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 17:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:17 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 17:17 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 17:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P78376 and previous config saved to /var/cache/conftool/dbconfig/20250618-171617-marostegui.json
  • 17:16 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 17:15 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 17:11 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:09 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:09 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host rdb2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:09 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host rdb2011.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:06 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:04 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 17:03 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2009
  • 17:03 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest2009
  • 17:03 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2006
  • 17:03 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest2006
  • 17:03 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2004
  • 17:03 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest2004
  • 17:02 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
  • 17:02 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 17:02 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host rdb2012
  • 17:02 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host rdb2012
  • 17:02 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host rdb2011
  • 17:02 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host rdb2011
  • 17:02 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host build2003
  • 17:01 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host build2003
  • 17:01 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker2009
  • 17:01 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker2009
  • 17:01 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker2008
  • 17:01 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker2008
  • 17:01 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker2007
  • 17:01 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker2007
  • 17:01 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker2006
  • 17:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T396130)', diff saved to https://phabricator.wikimedia.org/P78375 and previous config saved to /var/cache/conftool/dbconfig/20250618-170109-marostegui.json
  • 17:01 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker2006
  • 17:01 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:01 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding rdb2011 to codfw - jhancock@cumin1003"
  • 17:00 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding rdb2011 to codfw - jhancock@cumin1003"
  • 17:00 dancy@deploy1003: Finished scap sync-world: Backport for Update entries on https://www.mediawiki.org/keys/keys.html (T364694) (duration: 10m 09s)
  • 16:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:56 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2007.codfw.wmnet with OS bullseye
  • 16:55 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 16:53 dancy@deploy1003: dancy, aklapper: Continuing with sync
  • 16:52 dancy@deploy1003: dancy, aklapper: Backport for Update entries on https://www.mediawiki.org/keys/keys.html (T364694) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:50 dancy@deploy1003: Started scap sync-world: Backport for Update entries on https://www.mediawiki.org/keys/keys.html (T364694)
  • 16:50 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2005.codfw.wmnet with OS bullseye
  • 16:47 cdobbins@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on A:cp-eqsin and A:cp - 9.2.10 upgrade (T390912)
  • 16:47 ChrisDobbins901_: cdobbins@cumin2002:~$ sudo -i cookbook sre.cdn.roll-upgrade-ats --query 'A:cp-eqsin' --task-id T390912 --reason '9.2.10 upgrade'
  • 16:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T396130)', diff saved to https://phabricator.wikimedia.org/P78374 and previous config saved to /var/cache/conftool/dbconfig/20250618-164041-marostegui.json
  • 16:40 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 16:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T396130)', diff saved to https://phabricator.wikimedia.org/P78373 and previous config saved to /var/cache/conftool/dbconfig/20250618-164019-marostegui.json
  • 16:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 16:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 16:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:33 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:31 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P78372 and previous config saved to /var/cache/conftool/dbconfig/20250618-162511-marostegui.json
  • 16:23 hnowlan@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:21 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on A:esams or A:drmrs and A:cp - 9.2.10 upgrade (T390912)
  • 16:21 hnowlan@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:20 hnowlan@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:20 btullis@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts an-mariadb1002.eqiad.wmnet
  • 16:20 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
  • 16:19 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2005.codfw.wmnet with OS bullseye
  • 16:19 hnowlan@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:16 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:15 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:11 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:10 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
  • 16:10 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:10 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp7001.magru.wmnet with reason: BIOS upgrades
  • 16:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P78371 and previous config saved to /var/cache/conftool/dbconfig/20250618-161003-marostegui.json
  • 16:03 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-mariadb1002.eqiad.wmnet
  • 16:03 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:02 btullis@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on an-mariadb1002.eqiad.wmnet with reason: Upgrading SSD firmware
  • 15:59 swfrench-wmf: deployed conftool 5.3.0 to all bullseye and bookworm hosts - T395696
  • 15:59 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:58 btullis@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-coord1003.eqiad.wmnet
  • 15:57 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T396130)', diff saved to https://phabricator.wikimedia.org/P78370 and previous config saved to /var/cache/conftool/dbconfig/20250618-155455-marostegui.json
  • 15:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:45 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7001.*
  • 15:45 brett: Depooling cp7001 for firmware upgrades re: thermal support ticket - T386959
  • 15:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2005.codfw.wmnet with OS bullseye
  • 15:41 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-codfw and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 15:39 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:38 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2007.codfw.wmnet with OS bullseye
  • 15:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2007.codfw.wmnet with OS bullseye
  • 15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2006.codfw.wmnet with OS bullseye
  • 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T396130)', diff saved to https://phabricator.wikimedia.org/P78369 and previous config saved to /var/cache/conftool/dbconfig/20250618-153448-marostegui.json
  • 15:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T396130)', diff saved to https://phabricator.wikimedia.org/P78368 and previous config saved to /var/cache/conftool/dbconfig/20250618-153425-marostegui.json
  • 15:31 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:31 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 15:30 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:30 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 15:30 lucaswerkmeister-wmde@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:29 lucaswerkmeister-wmde@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 15:29 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:27 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:21 ejegg: enabled queue consumer to send donor portal login links
  • 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P78367 and previous config saved to /var/cache/conftool/dbconfig/20250618-151918-marostegui.json
  • 15:09 dancy@deploy1003: Finished scap sync-world: Testing T396166 (duration: 08m 37s)
  • 15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P78366 and previous config saved to /var/cache/conftool/dbconfig/20250618-150410-marostegui.json
  • 15:03 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1003.eqiad.wmnet
  • 15:01 dancy@deploy1003: Started scap sync-world: Testing T396166
  • 14:59 ejegg: civicrm upgraded from 63302c18 to 10eac2f8
  • 14:51 jmm@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir7002.magru.wmnet
  • 14:51 jmm@puppetserver1001: conftool action : set/pooled=no; selector: name=ncredir7002.magru.wmnet
  • 14:50 swfrench-wmf: reprepro included conftool 5.3.0 in apt.wikimedia.org - T395696
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T396130)', diff saved to https://phabricator.wikimedia.org/P78365 and previous config saved to /var/cache/conftool/dbconfig/20250618-144903-marostegui.json
  • 14:48 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1048.eqiad.wmnet
  • 14:47 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1048.eqiad.wmnet
  • 14:45 bking@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:45 bking@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:41 bking@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:41 bking@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:40 James_F: Running `mwscript-k8s --php_version=8.1 -f -- extensions/WikiLambda/maintenance/updateSecondaryTables.php --wiki=wikifunctionswiki --cache --verbose --zType Z8` for T396449
  • 14:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T396130)', diff saved to https://phabricator.wikimedia.org/P78364 and previous config saved to /var/cache/conftool/dbconfig/20250618-142852-marostegui.json
  • 14:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 14:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T396130)', diff saved to https://phabricator.wikimedia.org/P78363 and previous config saved to /var/cache/conftool/dbconfig/20250618-142829-marostegui.json
  • 14:26 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:24 bking@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:24 bking@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:21 bking@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:21 bking@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2007.codfw.wmnet with OS bullseye
  • 14:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd2007']
  • 14:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd2007']
  • 14:18 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:18 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:17 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:17 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:15 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:13 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P78360 and previous config saved to /var/cache/conftool/dbconfig/20250618-141322-marostegui.json
  • 14:13 jnuche@deploy1003: Installation of scap version "4.178.3" completed for 4 hosts
  • 14:10 jnuche@deploy1003: Installing scap version "4.178.3" for 4 host(s)
  • 14:09 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:08 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:08 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2006.codfw.wmnet with OS bullseye
  • 14:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd2006']
  • 14:05 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd2006']
  • 14:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:03 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2005.codfw.wmnet with OS bullseye
  • 13:59 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd2005']
  • 13:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd2005']
  • 13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P78359 and previous config saved to /var/cache/conftool/dbconfig/20250618-135814-marostegui.json
  • 13:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcephosd2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:54 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:54 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcephosd2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T396130)', diff saved to https://phabricator.wikimedia.org/P78358 and previous config saved to /var/cache/conftool/dbconfig/20250618-134307-marostegui.json
  • 13:42 moritzm: installing net-tools regression updates on Bullseye
  • 13:39 jnuche@deploy1003: Installation of scap version "4.178.2" completed for 4 hosts
  • 13:37 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs5004.eqsin.wmnet} and A:liberica (T396561)
  • 13:37 vgutierrez: repool lvs5004 (text) using katran - T396561
  • 13:37 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs5004.eqsin.wmnet} and A:liberica (T396561)
  • 13:36 jnuche@deploy1003: Installing scap version "4.178.2" for 4 host(s)
  • 13:30 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs5004.eqsin.wmnet
  • 13:30 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs5004.eqsin.wmnet
  • 13:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:26 kartik@deploy1003: Finished scap sync-world: Backport for Enable the Contribute menu in 8th group of Wikipedias (T395084) (duration: 11m 57s)
  • 13:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T396130)', diff saved to https://phabricator.wikimedia.org/P78357 and previous config saved to /var/cache/conftool/dbconfig/20250618-132242-marostegui.json
  • 13:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 13:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T396130)', diff saved to https://phabricator.wikimedia.org/P78356 and previous config saved to /var/cache/conftool/dbconfig/20250618-132220-marostegui.json
  • 13:19 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:19 kartik@deploy1003: kartik: Continuing with sync
  • 13:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:16 kartik@deploy1003: kartik: Backport for Enable the Contribute menu in 8th group of Wikipedias (T395084) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:15 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 13:14 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs5004.eqsin.wmnet with reason: switching to katran
  • 13:14 kartik@deploy1003: Started scap sync-world: Backport for Enable the Contribute menu in 8th group of Wikipedias (T395084)
  • 13:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:14 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs5004.eqsin.wmnet} and A:liberica (T396561)
  • 13:13 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs5004.eqsin.wmnet} and A:liberica (T396561)
  • 13:10 jnuche@deploy1003: Installation of scap version "4.178.1" completed for 4 hosts
  • 13:08 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on A:esams or A:drmrs and A:cp - 9.2.10 upgrade (T390912)
  • 13:07 jnuche@deploy1003: Installing scap version "4.178.1" for 4 host(s)
  • 13:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P78355 and previous config saved to /var/cache/conftool/dbconfig/20250618-130713-marostegui.json
  • 13:00 jynus: bacula director migration finalized, backup1014 is the new bacula director. backup1001 should no longer be used. T387892
  • 12:55 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:54 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P78354 and previous config saved to /var/cache/conftool/dbconfig/20250618-125206-marostegui.json
  • 12:45 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1048.eqiad.wmnet
  • 12:43 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1048.eqiad.wmnet
  • 12:43 elukey: drop old Thanos Swift's Tegola tile cache containers - T396584
  • 12:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T396130)', diff saved to https://phabricator.wikimedia.org/P78353 and previous config saved to /var/cache/conftool/dbconfig/20250618-123658-marostegui.json
  • 12:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 12:33 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 12:32 jnuche@deploy1003: Installing scap version "4.178.0" for 183 host(s)
  • 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T396130)', diff saved to https://phabricator.wikimedia.org/P78352 and previous config saved to /var/cache/conftool/dbconfig/20250618-121646-marostegui.json
  • 12:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T396130)', diff saved to https://phabricator.wikimedia.org/P78351 and previous config saved to /var/cache/conftool/dbconfig/20250618-121624-marostegui.json
  • 12:15 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:15 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P78350 and previous config saved to /var/cache/conftool/dbconfig/20250618-120117-marostegui.json
  • 11:56 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 11:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 11:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 11:55 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 11:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 11:54 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P78349 and previous config saved to /var/cache/conftool/dbconfig/20250618-114610-marostegui.json
  • 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2191 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78348 and previous config saved to /var/cache/conftool/dbconfig/20250618-113125-root.json
  • 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T396130)', diff saved to https://phabricator.wikimedia.org/P78347 and previous config saved to /var/cache/conftool/dbconfig/20250618-113103-marostegui.json
  • 11:27 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 11:26 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 11:25 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:24 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:21 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:21 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:20 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:19 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2191 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78345 and previous config saved to /var/cache/conftool/dbconfig/20250618-111620-root.json
  • 11:13 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1048.eqiad.wmnet
  • 11:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T396130)', diff saved to https://phabricator.wikimedia.org/P78344 and previous config saved to /var/cache/conftool/dbconfig/20250618-111239-marostegui.json
  • 11:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 11:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T396130)', diff saved to https://phabricator.wikimedia.org/P78343 and previous config saved to /var/cache/conftool/dbconfig/20250618-111217-marostegui.json
  • 11:09 root@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host backup1009.eqiad.wmnet
  • 11:07 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 11:07 root@cumin1002: START - Cookbook sre.puppet.migrate-host for host backup1009.eqiad.wmnet
  • 11:06 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 11:06 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 11:04 btullis@dns1004: END - running authdns-update
  • 11:03 btullis@dns1004: START - running authdns-update
  • 11:02 root@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host backup1009.eqiad.wmnet
  • 11:02 root@cumin1002: START - Cookbook sre.puppet.migrate-host for host backup1009.eqiad.wmnet
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2191 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78342 and previous config saved to /var/cache/conftool/dbconfig/20250618-110114-root.json
  • 10:59 reedy@deploy1003: Finished scap sync-world: Backport for composer: Various updates, Setup json linting (T397191), Improve function and property documentation for php code (T171115) (duration: 10m 20s)
  • 10:58 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2030.codfw.wmnet
  • 10:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
  • 10:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P78341 and previous config saved to /var/cache/conftool/dbconfig/20250618-105710-marostegui.json
  • 10:54 btullis@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts an-coord1003.eqiad.wmnet
  • 10:54 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1003.eqiad.wmnet
  • 10:52 root@cumin1002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for backup1009.eqiad.wmnet: Renew puppet certificate - root@cumin1002
  • 10:52 reedy@deploy1003: umherirrender, reedy: Continuing with sync
  • 10:51 reedy@deploy1003: umherirrender, reedy: Backport for composer: Various updates, Setup json linting (T397191), Improve function and property documentation for php code (T171115) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:49 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
  • 10:49 reedy@deploy1003: Started scap sync-world: Backport for composer: Various updates, Setup json linting (T397191), Improve function and property documentation for php code (T171115)
  • 10:48 root@cumin1002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for backup1009.eqiad.wmnet: Renew puppet certificate - root@cumin1002
  • 10:47 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2191 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78340 and previous config saved to /var/cache/conftool/dbconfig/20250618-104609-root.json
  • 10:43 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-coord1003.eqiad.wmnet
  • 10:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P78339 and previous config saved to /var/cache/conftool/dbconfig/20250618-104203-marostegui.json
  • 10:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2191.codfw.wmnet with reason: Maintenance
  • 10:40 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1003.eqiad.wmnet
  • 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2191', diff saved to https://phabricator.wikimedia.org/P78338 and previous config saved to /var/cache/conftool/dbconfig/20250618-104033-root.json
  • 10:40 btullis@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on an-coord1003.eqiad.wmnet with reason: Upgrading SSD firmware
  • 10:31 jmm@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2024.codfw.wmnet with reason: remove for decom
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T396130)', diff saved to https://phabricator.wikimedia.org/P78337 and previous config saved to /var/cache/conftool/dbconfig/20250618-102655-marostegui.json
  • 10:20 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup[1001,1014].eqiad.wmnet with reason: Backup director migration
  • 10:20 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1003.eqiad.wmnet
  • 10:18 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2212* slowly with 10 steps - Pooling in
  • 10:16 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host netboxdb1003.eqiad.wmnet
  • 10:14 jynus: starting backup director migration backup1001 -> backup1014 T387892
  • 10:10 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-cluster (exit_code=99) depool 44 services in codfw/codfw: pre-upgrade-test
  • 10:10 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-cluster depool 44 services in codfw/codfw: pre-upgrade-test
  • 10:06 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 10:06 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 10:05 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 10:05 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 10:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T396130)', diff saved to https://phabricator.wikimedia.org/P78335 and previous config saved to /var/cache/conftool/dbconfig/20250618-100329-marostegui.json
  • 10:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 10:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T396130)', diff saved to https://phabricator.wikimedia.org/P78333 and previous config saved to /var/cache/conftool/dbconfig/20250618-100300-marostegui.json
  • 10:02 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:02 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 09:56 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P78331 and previous config saved to /var/cache/conftool/dbconfig/20250618-094752-marostegui.json
  • 09:46 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 09:44 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2003.codfw.wmnet
  • 09:40 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host netboxdb2003.codfw.wmnet
  • 09:39 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 09:34 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh7004.wikimedia.org
  • 09:34 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh7004.wikimedia.org with OS bookworm
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P78329 and previous config saved to /var/cache/conftool/dbconfig/20250618-093245-marostegui.json
  • 09:29 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 09:28 jmm@cumin1003: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
  • 09:24 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet
  • 09:18 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1020.eqiad.wmnet with reason: Upgrading clouddbs T394372
  • 09:18 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh7004.wikimedia.org with reason: host reimage
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T396130)', diff saved to https://phabricator.wikimedia.org/P78327 and previous config saved to /var/cache/conftool/dbconfig/20250618-091738-marostegui.json
  • 09:15 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 09:15 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 09:15 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 09:13 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on doh7004.wikimedia.org with reason: host reimage
  • 09:11 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 09:11 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 09:11 jmm@cumin1003: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
  • 09:05 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 09:04 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs5005.eqsin.wmnet} and A:liberica (T396561)
  • 09:04 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs5005.eqsin.wmnet} and A:liberica (T396561)
  • 09:04 vgutierrez: repool lvs5005 (upload) using katran - T396561
  • 09:03 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet
  • 09:03 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet
  • 09:03 btullis@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-coord1003.eqiad.wmnet
  • 09:02 btullis@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1003.eqiad.wmnet
  • 09:01 btullis@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-coord1003.eqiad.wmnet
  • 09:00 btullis@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1003.eqiad.wmnet
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T396130)', diff saved to https://phabricator.wikimedia.org/P78325 and previous config saved to /var/cache/conftool/dbconfig/20250618-085417-marostegui.json
  • 08:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T396130)', diff saved to https://phabricator.wikimedia.org/P78324 and previous config saved to /var/cache/conftool/dbconfig/20250618-085354-marostegui.json
  • 08:50 hashar@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.6 refs T392176
  • 08:46 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host doh7004.wikimedia.org with OS bookworm
  • 08:42 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh7004.wikimedia.org - jmm@cumin1003"
  • 08:42 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh7004.wikimedia.org - jmm@cumin1003"
  • 08:42 btullis@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-coord1003.eqiad.wmnet
  • 08:42 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1003.eqiad.wmnet
  • 08:42 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1016.eqiad.wmnet with reason: Upgrading clouddbs T394372
  • 08:42 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh7004.wikimedia.org on all recursors
  • 08:42 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache doh7004.wikimedia.org on all recursors
  • 08:42 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:42 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh7004.wikimedia.org - jmm@cumin1003"
  • 08:41 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh7004.wikimedia.org - jmm@cumin1003"
  • 08:41 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet
  • 08:41 btullis@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-coord1003.eqiad.wmnet
  • 08:40 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1003.eqiad.wmnet
  • 08:39 btullis@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on an-coord1003.eqiad.wmnet with reason: Upgrading SSD firmware
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P78322 and previous config saved to /var/cache/conftool/dbconfig/20250618-083847-marostegui.json
  • 08:38 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs5005.eqsin.wmnet
  • 08:38 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs5005.eqsin.wmnet
  • 08:38 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 08:38 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host doh7004.wikimedia.org
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet
  • 08:25 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs5005.eqsin.wmnet with reason: switching to katran
  • 08:25 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs5005.eqsin.wmnet} and A:liberica (T396561)
  • 08:24 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs5005.eqsin.wmnet} and A:liberica (T396561)
  • 08:24 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
  • 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P78320 and previous config saved to /var/cache/conftool/dbconfig/20250618-082340-marostegui.json
  • 08:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet
  • 08:21 moritzm: rearm keyholder on cumin2002
  • 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2231 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78319 and previous config saved to /var/cache/conftool/dbconfig/20250618-082035-root.json
  • 08:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78317 and previous config saved to /var/cache/conftool/dbconfig/20250618-081657-root.json
  • 08:13 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 08:08 phuedx: UTC morning backport window finished
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T396130)', diff saved to https://phabricator.wikimedia.org/P78316 and previous config saved to /var/cache/conftool/dbconfig/20250618-080833-marostegui.json
  • 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2231 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78314 and previous config saved to /var/cache/conftool/dbconfig/20250618-080528-root.json
  • 08:05 phuedx@deploy1003: Finished scap sync-world: Backport for ext.wikimediaEvents: Repurpose PageVisit instrument (T397138) (duration: 16m 55s)
  • 08:02 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2212* slowly with 10 steps - Pooling in
  • 08:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78312 and previous config saved to /var/cache/conftool/dbconfig/20250618-080152-root.json
  • 07:58 phuedx@deploy1003: phuedx: Continuing with sync
  • 07:50 phuedx@deploy1003: phuedx: Backport for ext.wikimediaEvents: Repurpose PageVisit instrument (T397138) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2231 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78311 and previous config saved to /var/cache/conftool/dbconfig/20250618-075022-root.json
  • 07:48 phuedx@deploy1003: Started scap sync-world: Backport for ext.wikimediaEvents: Repurpose PageVisit instrument (T397138)
  • 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78310 and previous config saved to /var/cache/conftool/dbconfig/20250618-074646-root.json
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T396130)', diff saved to https://phabricator.wikimedia.org/P78309 and previous config saved to /var/cache/conftool/dbconfig/20250618-074521-marostegui.json
  • 07:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T396130)', diff saved to https://phabricator.wikimedia.org/P78308 and previous config saved to /var/cache/conftool/dbconfig/20250618-074459-marostegui.json
  • 07:43 kartik@deploy1003: Finished scap sync-world: Backport for Enable the Contribute menu on new Wikipedias automatically (T395031 T381371) (duration: 11m 56s)
  • 07:43 ryankemper: T386098 Killed the `wdqs-main` reload, it can be started up again on the new cumin later
  • 07:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:41 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20250526/ using stat1009.eqiad.wmnet)
  • 07:36 brouberol@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
  • 07:36 kartik@deploy1003: kartik: Continuing with sync
  • 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2231 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78307 and previous config saved to /var/cache/conftool/dbconfig/20250618-073517-root.json
  • 07:33 kartik@deploy1003: kartik: Backport for Enable the Contribute menu on new Wikipedias automatically (T395031 T381371) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78306 and previous config saved to /var/cache/conftool/dbconfig/20250618-073140-root.json
  • 07:31 kartik@deploy1003: Started scap sync-world: Backport for Enable the Contribute menu on new Wikipedias automatically (T395031 T381371)
  • 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P78305 and previous config saved to /var/cache/conftool/dbconfig/20250618-072951-marostegui.json
  • 07:29 gkyziridis@deploy1003: Finished scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for the third batch of wikis (T395824) (duration: 23m 35s)
  • 07:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2231.codfw.wmnet with reason: Maintenance
  • 07:24 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2231.codfw.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2231', diff saved to https://phabricator.wikimedia.org/P78304 and previous config saved to /var/cache/conftool/dbconfig/20250618-072404-root.json
  • 07:22 gkyziridis@deploy1003: gkyziridis: Continuing with sync
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P78303 and previous config saved to /var/cache/conftool/dbconfig/20250618-071634-root.json
  • 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P78302 and previous config saved to /var/cache/conftool/dbconfig/20250618-071443-marostegui.json
  • 07:12 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) check 44 services in codfw: maintenance
  • 07:12 jayme@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 48 services: maintenance
  • 07:12 jayme@cumin1002: START - Cookbook sre.discovery.service-route check 48 services: maintenance
  • 07:12 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-cluster check 44 services in codfw: maintenance
  • 07:08 gkyziridis@deploy1003: gkyziridis: Backport for ores-extension: enable extension with revertrisk filter for the third batch of wikis (T395824) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:06 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for the third batch of wikis (T395824)
  • 07:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 10 hosts with reason: Maintenance
  • 07:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[1155-1156].eqiad.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P78301 and previous config saved to /var/cache/conftool/dbconfig/20250618-070239-root.json
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T396130)', diff saved to https://phabricator.wikimedia.org/P78300 and previous config saved to /var/cache/conftool/dbconfig/20250618-065936-marostegui.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78299 and previous config saved to /var/cache/conftool/dbconfig/20250618-063921-root.json
  • 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T396130)', diff saved to https://phabricator.wikimedia.org/P78298 and previous config saved to /var/cache/conftool/dbconfig/20250618-063608-marostegui.json
  • 06:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T396130)', diff saved to https://phabricator.wikimedia.org/P78297 and previous config saved to /var/cache/conftool/dbconfig/20250618-063546-marostegui.json
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78296 and previous config saved to /var/cache/conftool/dbconfig/20250618-062416-root.json
  • 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P78295 and previous config saved to /var/cache/conftool/dbconfig/20250618-062038-marostegui.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78294 and previous config saved to /var/cache/conftool/dbconfig/20250618-061555-root.json
  • 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P78293 and previous config saved to /var/cache/conftool/dbconfig/20250618-060910-root.json
  • 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P78292 and previous config saved to /var/cache/conftool/dbconfig/20250618-060531-marostegui.json
  • 06:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78291 and previous config saved to /var/cache/conftool/dbconfig/20250618-060049-root.json
  • 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P78290 and previous config saved to /var/cache/conftool/dbconfig/20250618-055404-root.json
  • 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T396130)', diff saved to https://phabricator.wikimedia.org/P78289 and previous config saved to /var/cache/conftool/dbconfig/20250618-055023-marostegui.json
  • 05:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Maintenance
  • 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78288 and previous config saved to /var/cache/conftool/dbconfig/20250618-054543-root.json
  • 05:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78287 and previous config saved to /var/cache/conftool/dbconfig/20250618-053858-root.json
  • 05:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 05:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1188', diff saved to https://phabricator.wikimedia.org/P78286 and previous config saved to /var/cache/conftool/dbconfig/20250618-053253-root.json
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78285 and previous config saved to /var/cache/conftool/dbconfig/20250618-053038-root.json
  • 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T396130)', diff saved to https://phabricator.wikimedia.org/P78284 and previous config saved to /var/cache/conftool/dbconfig/20250618-052645-marostegui.json
  • 05:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 05:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 05:20 marostegui@dns1006: END - running authdns-update
  • 05:19 marostegui@dns1006: START - running authdns-update
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1173 T397198', diff saved to https://phabricator.wikimedia.org/P78283 and previous config saved to /var/cache/conftool/dbconfig/20250618-051935-root.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T397198', diff saved to https://phabricator.wikimedia.org/P78281 and previous config saved to /var/cache/conftool/dbconfig/20250618-051812-root.json
  • 05:18 marostegui: Starting s6 eqiad failover from db1173 to db1201 - T397198
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1201 from API/vslow/dump T397198', diff saved to https://phabricator.wikimedia.org/P78279 and previous config saved to /var/cache/conftool/dbconfig/20250618-045821-marostegui.json
  • 04:57 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s6 T397198
  • 04:57 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1201 with weight 0 T397198', diff saved to https://phabricator.wikimedia.org/P78278 and previous config saved to /var/cache/conftool/dbconfig/20250618-045741-marostegui.json
  • 04:57 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 04:44 ryankemper: [WDQS] Restarted blazegraph on `wdqs2009` just in case it's locked up
  • 04:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cirrussearch[2055-2060].codfw.wmnet
  • 04:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch[2055-2060].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
  • 04:39 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch[2055-2060].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
  • 04:34 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 04:04 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts cirrussearch[2055-2060].codfw.wmnet
  • 01:33 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 01:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T382778)', diff saved to https://phabricator.wikimedia.org/P78277 and previous config saved to /var/cache/conftool/dbconfig/20250618-013307-ladsgroup.json
  • 01:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P78276 and previous config saved to /var/cache/conftool/dbconfig/20250618-011800-ladsgroup.json
  • 01:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P78275 and previous config saved to /var/cache/conftool/dbconfig/20250618-010253-ladsgroup.json
  • 00:52 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 00:51 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 00:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T382778)', diff saved to https://phabricator.wikimedia.org/P78274 and previous config saved to /var/cache/conftool/dbconfig/20250618-004745-ladsgroup.json
  • 00:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T382778)', diff saved to https://phabricator.wikimedia.org/P78273 and previous config saved to /var/cache/conftool/dbconfig/20250618-004434-ladsgroup.json
  • 00:44 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 00:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T382778)', diff saved to https://phabricator.wikimedia.org/P78272 and previous config saved to /var/cache/conftool/dbconfig/20250618-004423-ladsgroup.json
  • 00:40 krinkle@deploy1003: Finished scap sync-world: Backport for multiversion: Re-use prod for beta setSiteInfoForWiki (T289318) (duration: 13m 23s)
  • 00:33 krinkle@deploy1003: krinkle: Continuing with sync
  • 00:30 sukhe@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on durum7003.magru.wmnet with reason: insetup host; will resolve service errors later
  • 00:29 krinkle@deploy1003: krinkle: Backport for multiversion: Re-use prod for beta setSiteInfoForWiki (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P78271 and previous config saved to /var/cache/conftool/dbconfig/20250618-002915-ladsgroup.json
  • 00:27 krinkle@deploy1003: Started scap sync-world: Backport for multiversion: Re-use prod for beta setSiteInfoForWiki (T289318)
  • 00:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P78270 and previous config saved to /var/cache/conftool/dbconfig/20250618-001408-ladsgroup.json
  • 00:12 cdobbins@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp30[66-81].esams.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)

2025-06-17

  • 23:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T382778)', diff saved to https://phabricator.wikimedia.org/P78269 and previous config saved to /var/cache/conftool/dbconfig/20250617-235900-ladsgroup.json
  • 23:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T382778)', diff saved to https://phabricator.wikimedia.org/P78268 and previous config saved to /var/cache/conftool/dbconfig/20250617-235543-ladsgroup.json
  • 23:55 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T382778)', diff saved to https://phabricator.wikimedia.org/P78267 and previous config saved to /var/cache/conftool/dbconfig/20250617-235521-ladsgroup.json
  • 23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P78266 and previous config saved to /var/cache/conftool/dbconfig/20250617-234013-ladsgroup.json
  • 23:38 krinkle@deploy1003: Finished scap sync-world: Backport for multiversion: Remove routing for former `deploymentwiki` in Beta (T198673 T289318) (duration: 14m 00s)
  • 23:31 krinkle@deploy1003: krinkle: Continuing with sync
  • 23:30 brennen@deploy1003: Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 63m 03s)
  • 23:26 krinkle@deploy1003: krinkle: Backport for multiversion: Remove routing for former `deploymentwiki` in Beta (T198673 T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P78265 and previous config saved to /var/cache/conftool/dbconfig/20250617-232506-ladsgroup.json
  • 23:24 krinkle@deploy1003: Started scap sync-world: Backport for multiversion: Remove routing for former `deploymentwiki` in Beta (T198673 T289318)
  • 23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T382778)', diff saved to https://phabricator.wikimedia.org/P78264 and previous config saved to /var/cache/conftool/dbconfig/20250617-230959-ladsgroup.json
  • 23:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-coord1002.eqiad.wmnet with OS bullseye
  • 23:07 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T382778)', diff saved to https://phabricator.wikimedia.org/P78263 and previous config saved to /var/cache/conftool/dbconfig/20250617-230639-ladsgroup.json
  • 23:06 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 23:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T382778)', diff saved to https://phabricator.wikimedia.org/P78262 and previous config saved to /var/cache/conftool/dbconfig/20250617-230616-ladsgroup.json
  • 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-master1003.eqiad.wmnet with OS bullseye
  • 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:05 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-master1004.eqiad.wmnet with OS bullseye
  • 23:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 22:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1002.eqiad.wmnet with reason: host reimage
  • 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcephosd2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P78261 and previous config saved to /var/cache/conftool/dbconfig/20250617-225108-ladsgroup.json
  • 22:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1003.eqiad.wmnet with reason: host reimage
  • 22:48 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1002.eqiad.wmnet with reason: host reimage
  • 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1004.eqiad.wmnet with reason: host reimage
  • 22:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcephosd2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:43 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1003.eqiad.wmnet with reason: host reimage
  • 22:43 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1004.eqiad.wmnet with reason: host reimage
  • 22:40 krinkle@deploy1003: Finished scap sync-world: Backport for multiversion: Remove unused newFromDBName() (duration: 11m 37s)
  • 22:37 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bullseye
  • 22:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:36 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-drmrs and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 22:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P78260 and previous config saved to /var/cache/conftool/dbconfig/20250617-223601-ladsgroup.json
  • 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcephosd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2007
  • 22:35 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2007
  • 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2006
  • 22:35 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2006
  • 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2005
  • 22:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:34 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2005
  • 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudcephosd2005 to codfw - jhancock@cumin2002"
  • 22:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudcephosd2005 to codfw - jhancock@cumin2002"
  • 22:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:33 krinkle@deploy1003: krinkle: Continuing with sync
  • 22:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-master1004.eqiad.wmnet with OS bullseye
  • 22:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-master1003.eqiad.wmnet with OS bullseye
  • 22:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 22:30 krinkle@deploy1003: krinkle: Backport for multiversion: Remove unused newFromDBName() synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:28 krinkle@deploy1003: Started scap sync-world: Backport for multiversion: Remove unused newFromDBName()
  • 22:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:28 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:27 brennen@deploy1003: Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling)
  • 22:25 brennen@deploy1003: Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 00m 07s)
  • 22:25 brennen@deploy1003: Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling)
  • 22:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T382778)', diff saved to https://phabricator.wikimedia.org/P78259 and previous config saved to /var/cache/conftool/dbconfig/20250617-222053-ladsgroup.json
  • 22:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:18 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T382778)', diff saved to https://phabricator.wikimedia.org/P78258 and previous config saved to /var/cache/conftool/dbconfig/20250617-221737-ladsgroup.json
  • 22:17 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 22:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T382778)', diff saved to https://phabricator.wikimedia.org/P78257 and previous config saved to /var/cache/conftool/dbconfig/20250617-221714-ladsgroup.json
  • 22:16 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:12 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:06 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:05 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:05 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P78256 and previous config saved to /var/cache/conftool/dbconfig/20250617-220207-ladsgroup.json
  • 22:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-test-master1004
  • 21:59 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host an-test-master1004
  • 21:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:48 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:47 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:47 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P78255 and previous config saved to /var/cache/conftool/dbconfig/20250617-214659-ladsgroup.json
  • 21:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T382778)', diff saved to https://phabricator.wikimedia.org/P78254 and previous config saved to /var/cache/conftool/dbconfig/20250617-213153-ladsgroup.json
  • 21:31 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:29 brennen@deploy1003: Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 00m 07s)
  • 21:29 brennen@deploy1003: Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling)
  • 21:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T382778)', diff saved to https://phabricator.wikimedia.org/P78253 and previous config saved to /var/cache/conftool/dbconfig/20250617-212835-ladsgroup.json
  • 21:28 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 21:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T382778)', diff saved to https://phabricator.wikimedia.org/P78252 and previous config saved to /var/cache/conftool/dbconfig/20250617-212813-ladsgroup.json
  • 21:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:27 brennen@deploy1003: Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 00m 07s)
  • 21:26 brennen@deploy1003: Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling)
  • 21:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-test-master1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:24 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:24 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for an-test-coord1002 - jclark@cumin1002"
  • 21:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for an-test-coord1002 - jclark@cumin1002"
  • 21:21 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:16 cscott@deploy1003: Finished scap sync-world: Backport for stats: Add buckets based on wikitext size; fix increment bug (T393400) (duration: 12m 09s)
  • 21:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P78251 and previous config saved to /var/cache/conftool/dbconfig/20250617-211305-ladsgroup.json
  • 21:09 cscott@deploy1003: cscott: Continuing with sync
  • 21:06 cscott@deploy1003: cscott: Backport for stats: Add buckets based on wikitext size; fix increment bug (T393400) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:03 cscott@deploy1003: Started scap sync-world: Backport for stats: Add buckets based on wikitext size; fix increment bug (T393400)
  • 21:01 cscott@deploy1003: Finished scap sync-world: Backport for stats: Add buckets based on wikitext size; fix increment bug (T393400) (duration: 10m 04s)
  • 20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P78250 and previous config saved to /var/cache/conftool/dbconfig/20250617-205758-ladsgroup.json
  • 20:54 cscott@deploy1003: cscott: Continuing with sync
  • 20:53 cscott@deploy1003: cscott: Backport for stats: Add buckets based on wikitext size; fix increment bug (T393400) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:51 cscott@deploy1003: Started scap sync-world: Backport for stats: Add buckets based on wikitext size; fix increment bug (T393400)
  • 20:47 jsn@deploy1003: Finished scap sync-world: Backport for Enable new mobile search experience everywhere (not including empty search recommendations) (T393944), undeploy enwiki Patroller Tools surveys (T396250) (duration: 11m 25s)
  • 20:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T382778)', diff saved to https://phabricator.wikimedia.org/P78249 and previous config saved to /var/cache/conftool/dbconfig/20250617-204250-ladsgroup.json
  • 20:40 jsn@deploy1003: bwang, jsn: Continuing with sync
  • 20:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T382778)', diff saved to https://phabricator.wikimedia.org/P78248 and previous config saved to /var/cache/conftool/dbconfig/20250617-203933-ladsgroup.json
  • 20:39 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 20:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T382778)', diff saved to https://phabricator.wikimedia.org/P78247 and previous config saved to /var/cache/conftool/dbconfig/20250617-203910-ladsgroup.json
  • 20:38 jsn@deploy1003: bwang, jsn: Backport for Enable new mobile search experience everywhere (not including empty search recommendations) (T393944), undeploy enwiki Patroller Tools surveys (T396250) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:36 jsn@deploy1003: Started scap sync-world: Backport for Enable new mobile search experience everywhere (not including empty search recommendations) (T393944), undeploy enwiki Patroller Tools surveys (T396250)
  • 20:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
  • 20:32 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: marostegui maintenance
  • 20:30 ebernhardson@deploy1003: Finished scap sync-world: Backport for Revert "cirrussearch: return traffic to all DCs" (duration: 09m 59s)
  • 20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P78246 and previous config saved to /var/cache/conftool/dbconfig/20250617-202403-ladsgroup.json
  • 20:24 ebernhardson@deploy1003: ebernhardson: Continuing with sync
  • 20:23 ebernhardson@deploy1003: ebernhardson: Backport for Revert "cirrussearch: return traffic to all DCs" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:20 ebernhardson@deploy1003: Started scap sync-world: Backport for Revert "cirrussearch: return traffic to all DCs"
  • 20:14 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
  • 20:09 ebernhardson@deploy1003: Sync cancelled.
  • 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P78245 and previous config saved to /var/cache/conftool/dbconfig/20250617-200854-ladsgroup.json
  • 20:07 ebernhardson@deploy1003: bking, ebernhardson: Backport for cirrussearch: return traffic to all DCs (T388610) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:05 ebernhardson@deploy1003: Started scap sync-world: Backport for cirrussearch: return traffic to all DCs (T388610)
  • 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T382778)', diff saved to https://phabricator.wikimedia.org/P78244 and previous config saved to /var/cache/conftool/dbconfig/20250617-195347-ladsgroup.json
  • 19:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T382778)', diff saved to https://phabricator.wikimedia.org/P78243 and previous config saved to /var/cache/conftool/dbconfig/20250617-195029-ladsgroup.json
  • 19:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 19:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T382778)', diff saved to https://phabricator.wikimedia.org/P78242 and previous config saved to /var/cache/conftool/dbconfig/20250617-195017-ladsgroup.json
  • 19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P78241 and previous config saved to /var/cache/conftool/dbconfig/20250617-193508-ladsgroup.json
  • 19:26 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Adding wikikube-worker-exp2001 - jiji@cumin1002 - T397051"
  • 19:26 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Adding wikikube-worker-exp2001 - jiji@cumin1002 - T397051"
  • 19:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P78240 and previous config saved to /var/cache/conftool/dbconfig/20250617-192001-ladsgroup.json
  • 19:12 brennen@deploy1003: Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 00m 10s)
  • 19:12 brennen@deploy1003: Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling)
  • 19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T382778)', diff saved to https://phabricator.wikimedia.org/P78238 and previous config saved to /var/cache/conftool/dbconfig/20250617-190453-ladsgroup.json
  • 19:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T382778)', diff saved to https://phabricator.wikimedia.org/P78237 and previous config saved to /var/cache/conftool/dbconfig/20250617-190136-ladsgroup.json
  • 19:01 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 19:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T382778)', diff saved to https://phabricator.wikimedia.org/P78236 and previous config saved to /var/cache/conftool/dbconfig/20250617-190113-ladsgroup.json
  • 18:59 brett: Restarting pybal on lvs1016, setting it to primary - T387145
  • 18:53 jgleeson: civicrm upgraded from d592e64c to 63302c18
  • 18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P78235 and previous config saved to /var/cache/conftool/dbconfig/20250617-184606-ladsgroup.json
  • 18:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P78234 and previous config saved to /var/cache/conftool/dbconfig/20250617-183059-ladsgroup.json
  • 18:24 brennen@deploy1003: Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling) (duration: 00m 18s)
  • 18:24 brennen@deploy1003: Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (once more, with feeling)
  • 18:20 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:18 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.6 refs T392176
  • 18:18 brett@cumin2002: START - Cookbook sre.dns.netbox
  • 18:16 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doc1003.eqiad.wmnet
  • 18:16 aokoth@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:16 aokoth@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aokoth@cumin1002"
  • 18:16 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lvs1017.eqiad.wmnet
  • 18:16 brett@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:15 aokoth@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doc1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aokoth@cumin1002"
  • 18:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T382778)', diff saved to https://phabricator.wikimedia.org/P78233 and previous config saved to /var/cache/conftool/dbconfig/20250617-181552-ladsgroup.json
  • 18:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T382778)', diff saved to https://phabricator.wikimedia.org/P78232 and previous config saved to /var/cache/conftool/dbconfig/20250617-181321-ladsgroup.json
  • 18:13 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 18:12 brett@cumin2002: START - Cookbook sre.dns.netbox
  • 18:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 18:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T382778)', diff saved to https://phabricator.wikimedia.org/P78231 and previous config saved to /var/cache/conftool/dbconfig/20250617-181144-ladsgroup.json
  • 18:10 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
  • 18:06 aokoth@cumin1002: START - Cookbook sre.dns.netbox
  • 18:02 aokoth@cumin1002: START - Cookbook sre.hosts.decommission for hosts doc1003.eqiad.wmnet
  • 18:01 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
  • 17:58 brett@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs1017.eqiad.wmnet
  • 17:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P78230 and previous config saved to /var/cache/conftool/dbconfig/20250617-175637-ladsgroup.json
  • 17:56 bvibber@deploy1003: Finished scap sync-world: Backport for Enable JSON transforms for Chart+JsonConfig (T388616) (duration: 10m 47s)
  • 17:53 btullis@dns1004: END - running authdns-update
  • 17:52 btullis@dns1004: START - running authdns-update
  • 17:49 bvibber@deploy1003: bvibber: Continuing with sync
  • 17:48 bvibber@deploy1003: bvibber: Backport for Enable JSON transforms for Chart+JsonConfig (T388616) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:45 bvibber@deploy1003: Started scap sync-world: Backport for Enable JSON transforms for Chart+JsonConfig (T388616)
  • 17:44 jiji@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host wikikube-worker-exp2001.codfw.wmnet
  • 17:44 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet with OS bookworm
  • 17:43 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on lvs1017.eqiad.wmnet with reason: T387145
  • 17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P78229 and previous config saved to /var/cache/conftool/dbconfig/20250617-174130-ladsgroup.json
  • 17:38 brett: stopping pybal on lvs1017 to move traffic over to lvs1020 - T387145
  • 17:37 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
  • 17:36 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:32 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:28 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:28 brennen@deploy1003: Finished deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889 (duration: 00m 23s)
  • 17:27 brennen@deploy1003: Started deploy [phabricator/deployment@f8d7b38]: re-test deploy to phab1005 for T377889
  • 17:26 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker-exp2001.codfw.wmnet with reason: host reimage
  • 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T382778)', diff saved to https://phabricator.wikimedia.org/P78228 and previous config saved to /var/cache/conftool/dbconfig/20250617-172622-ladsgroup.json
  • 17:25 brett: homer "cr*-eqiad*" commit "enable BGP on lvs1016" - T387145
  • 17:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T382778)', diff saved to https://phabricator.wikimedia.org/P78227 and previous config saved to /var/cache/conftool/dbconfig/20250617-172330-ladsgroup.json
  • 17:23 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 17:23 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker-exp2001.codfw.wmnet with reason: host reimage
  • 17:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T382778)', diff saved to https://phabricator.wikimedia.org/P78226 and previous config saved to /var/cache/conftool/dbconfig/20250617-172308-ladsgroup.json
  • 17:21 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:19 swfrench-wmf: migrated shellbox-syntaxhighlight to bookworm-based httpd images - T378128
  • 17:18 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:18 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:15 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1016.eqiad.wmnet with OS bullseye
  • 17:10 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 17:09 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P78225 and previous config saved to /var/cache/conftool/dbconfig/20250617-170800-ladsgroup.json
  • 17:05 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker-exp2001.codfw.wmnet with OS bookworm
  • 17:00 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 17:00 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wikikube-worker-exp2001.codfw.wmnet - jiji@cumin1002"
  • 16:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:57 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
  • 16:57 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
  • 16:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:54 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 16:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P78224 and previous config saved to /var/cache/conftool/dbconfig/20250617-165253-ladsgroup.json
  • 16:52 ejegg: civicrm upgraded from ec2cd980 to d592e64c
  • 16:47 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wikikube-worker-exp2001.codfw.wmnet - jiji@cumin1002"
  • 16:47 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
  • 16:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T382778)', diff saved to https://phabricator.wikimedia.org/P78223 and previous config saved to /var/cache/conftool/dbconfig/20250617-163746-ladsgroup.json
  • 16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T382778)', diff saved to https://phabricator.wikimedia.org/P78222 and previous config saved to /var/cache/conftool/dbconfig/20250617-163434-ladsgroup.json
  • 16:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T382778)', diff saved to https://phabricator.wikimedia.org/P78221 and previous config saved to /var/cache/conftool/dbconfig/20250617-163412-ladsgroup.json
  • 16:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T396130)', diff saved to https://phabricator.wikimedia.org/P78220 and previous config saved to /var/cache/conftool/dbconfig/20250617-163231-marostegui.json
  • 16:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
  • 16:25 cdobbins@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp30[66-81].esams.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 16:24 ChrisDobbins901_: cdobbins@cumin2002:~$ sudo -i cookbook sre.cdn.roll-upgrade-varnish --query 'P{cp30[66-81].esams.wmnet}' --reason 'Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0' --task-id T396581
  • 16:24 ChrisDobbins901_: cdobbins@cumin2002:~$ sudo -i cookbook --dry-run sre.cdn.roll-upgrade-varnish --query 'P{cp30[66-81].esams.wmnet}' --reason 'Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0' --task-id T396581
  • 16:22 root@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-backup2001.codfw.wmnet: Renew puppet certificate - root@cumin1002
  • 16:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
  • 16:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P78218 and previous config saved to /var/cache/conftool/dbconfig/20250617-161904-ladsgroup.json
  • 16:18 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on A:ulsfo and A:cp - 9.2.10 upgrade (T390912)
  • 16:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P78217 and previous config saved to /var/cache/conftool/dbconfig/20250617-161723-marostegui.json
  • 16:08 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bullseye
  • 16:05 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:05 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P78216 and previous config saved to /var/cache/conftool/dbconfig/20250617-160357-ladsgroup.json
  • 16:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P78215 and previous config saved to /var/cache/conftool/dbconfig/20250617-160216-marostegui.json
  • 16:01 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-drmrs and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 15:54 cdobbins@dns1004: END - running authdns-update
  • 15:53 cdobbins@dns1004: START - running authdns-update
  • 15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T382778)', diff saved to https://phabricator.wikimedia.org/P78214 and previous config saved to /var/cache/conftool/dbconfig/20250617-154850-ladsgroup.json
  • 15:47 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T396130)', diff saved to https://phabricator.wikimedia.org/P78213 and previous config saved to /var/cache/conftool/dbconfig/20250617-154709-marostegui.json
  • 15:47 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T382778)', diff saved to https://phabricator.wikimedia.org/P78212 and previous config saved to /var/cache/conftool/dbconfig/20250617-154542-ladsgroup.json
  • 15:45 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T382778)', diff saved to https://phabricator.wikimedia.org/P78211 and previous config saved to /var/cache/conftool/dbconfig/20250617-154520-ladsgroup.json
  • 15:33 effie: stopping puppet on A:wikikube-worker and A:eqiad
  • 15:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P78210 and previous config saved to /var/cache/conftool/dbconfig/20250617-153013-ladsgroup.json
  • 15:29 dancy@deploy1003: Finished scap sync-world: Testing T396166 (duration: 03m 46s)
  • 15:28 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:28 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:28 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-backup2001.codfw.wmnet with reason: Maintenance and reboot
  • 15:26 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C
  • 15:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T396130)', diff saved to https://phabricator.wikimedia.org/P78209 and previous config saved to /var/cache/conftool/dbconfig/20250617-152555-marostegui.json
  • 15:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 15:25 dancy@deploy1003: Started scap sync-world: Testing T396166
  • 15:24 dancy@deploy1003: Installation of scap version "4.177.0" completed for 2 hosts
  • 15:24 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum7004.magru.wmnet
  • 15:24 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum7004.magru.wmnet with OS bookworm
  • 15:23 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:23 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:22 dancy@deploy1003: Installing scap version "4.177.0" for 2 host(s)
  • 15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P78208 and previous config saved to /var/cache/conftool/dbconfig/20250617-151505-ladsgroup.json
  • 15:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 15:07 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum7004.magru.wmnet with reason: host reimage
  • 15:04 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum7004.magru.wmnet with reason: host reimage
  • 15:02 Lucas_WMDE: extra-long UTC afternoon backport+config window done
  • 15:02 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for stats: Add buckets based on wikitext size; fix increment bug (T393400), debug.json: add mw-experimental hosts (T276994) (duration: 15m 59s)
  • 15:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T382778)', diff saved to https://phabricator.wikimedia.org/P78207 and previous config saved to /var/cache/conftool/dbconfig/20250617-145958-ladsgroup.json
  • 14:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T382778)', diff saved to https://phabricator.wikimedia.org/P78206 and previous config saved to /var/cache/conftool/dbconfig/20250617-145642-ladsgroup.json
  • 14:56 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 14:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T382778)', diff saved to https://phabricator.wikimedia.org/P78205 and previous config saved to /var/cache/conftool/dbconfig/20250617-145619-ladsgroup.json
  • 14:55 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jiji, cscott: Continuing with sync
  • 14:52 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on A:ulsfo and A:cp - 9.2.10 upgrade (T390912)
  • 14:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 14:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T396130)', diff saved to https://phabricator.wikimedia.org/P78204 and previous config saved to /var/cache/conftool/dbconfig/20250617-145052-marostegui.json
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78203 and previous config saved to /var/cache/conftool/dbconfig/20250617-144918-root.json
  • 14:49 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jiji, cscott: Backport for stats: Add buckets based on wikitext size; fix increment bug (T393400), debug.json: add mw-experimental hosts (T276994) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:46 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for stats: Add buckets based on wikitext size; fix increment bug (T393400), debug.json: add mw-experimental hosts (T276994)
  • 14:45 dancy@deploy1003: Installation of scap version "4.176.0" completed for 2 hosts
  • 14:44 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 14:44 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 14:43 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 14:43 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 14:43 dancy@deploy1003: Installing scap version "4.176.0" for 2 host(s)
  • 14:42 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on A:magru and not P{cp7002*} and A:cp - 9.2.10 upgrade (T390912)
  • 14:42 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum7002.magru.wmnet with OS bookworm
  • 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P78202 and previous config saved to /var/cache/conftool/dbconfig/20250617-144112-ladsgroup.json
  • 14:40 jiji@dns1004: END - running authdns-update
  • 14:39 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host durum7004.magru.wmnet with OS bookworm
  • 14:38 jiji@dns1004: START - running authdns-update
  • 14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P78201 and previous config saved to /var/cache/conftool/dbconfig/20250617-143545-marostegui.json
  • 14:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78200 and previous config saved to /var/cache/conftool/dbconfig/20250617-143413-root.json
  • 14:33 sbisson@deploy1003: Finished scap sync-world: Backport for CX3 Build 1.0.0+20250616 (T374695 T395415 T396628 T396711 T396716 T396836) (duration: 11m 21s)
  • 14:32 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum7004.magru.wmnet - jmm@cumin1003"
  • 14:32 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum7004.magru.wmnet - jmm@cumin1003"
  • 14:31 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum7004.magru.wmnet on all recursors
  • 14:31 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache durum7004.magru.wmnet on all recursors
  • 14:31 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:31 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum7004.magru.wmnet - jmm@cumin1003"
  • 14:31 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum7004.magru.wmnet - jmm@cumin1003"
  • 14:30 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
  • 14:28 marostegui@dns1006: END - running authdns-update
  • 14:27 marostegui@dns1006: START - running authdns-update
  • 14:26 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C
  • 14:26 sbisson@deploy1003: sbisson: Continuing with sync
  • 14:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P78199 and previous config saved to /var/cache/conftool/dbconfig/20250617-142605-ladsgroup.json
  • 14:25 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 14:25 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host durum7004.magru.wmnet
  • 14:24 sbisson@deploy1003: sbisson: Backport for CX3 Build 1.0.0+20250616 (T374695 T395415 T396628 T396711 T396716 T396836) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:22 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum7002.magru.wmnet with reason: host reimage
  • 14:22 sbisson@deploy1003: Started scap sync-world: Backport for CX3 Build 1.0.0+20250616 (T374695 T395415 T396628 T396711 T396716 T396836)
  • 14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P78198 and previous config saved to /var/cache/conftool/dbconfig/20250617-142037-marostegui.json
  • 14:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78197 and previous config saved to /var/cache/conftool/dbconfig/20250617-141907-root.json
  • 14:18 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum7002.magru.wmnet with reason: host reimage
  • 14:17 logmsgbot: ihurbain Deployed security patch for T397127
  • 14:15 volans: uploaded python3-wmflib_2.0.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia,trixie-wikimedia
  • 14:11 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T382778)', diff saved to https://phabricator.wikimedia.org/P78196 and previous config saved to /var/cache/conftool/dbconfig/20250617-141058-ladsgroup.json
  • 14:08 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 14:08 logmsgbot: ihurbain Deployed security patch for T397127
  • 14:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T382778)', diff saved to https://phabricator.wikimedia.org/P78195 and previous config saved to /var/cache/conftool/dbconfig/20250617-140741-ladsgroup.json
  • 14:07 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 14:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T382778)', diff saved to https://phabricator.wikimedia.org/P78194 and previous config saved to /var/cache/conftool/dbconfig/20250617-140718-ladsgroup.json
  • 14:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T396130)', diff saved to https://phabricator.wikimedia.org/P78193 and previous config saved to /var/cache/conftool/dbconfig/20250617-140530-marostegui.json
  • 14:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78192 and previous config saved to /var/cache/conftool/dbconfig/20250617-140402-root.json
  • 14:03 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5002.eqsin.wmnet with OS bookworm
  • 13:59 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs[7001-7002].magru.wmnet} and A:liberica (T397053)
  • 13:59 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs[7001-7002].magru.wmnet} and A:liberica
  • 13:59 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin pooling P{lvs[7001-7002].magru.wmnet} and A:liberica
  • 13:58 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs[7001-7002].magru.wmnet} and A:liberica
  • 13:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:58 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs[7001-7002].magru.wmnet} and A:liberica
  • 13:58 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@90a716a]: T365813 (duration: 01m 21s)
  • 13:58 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs[7001-7002].magru.wmnet} and A:liberica
  • 13:57 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin pooling P{lvs[7001-7002].magru.wmnet} and A:liberica
  • 13:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1165 T395989', diff saved to https://phabricator.wikimedia.org/P78191 and previous config saved to /var/cache/conftool/dbconfig/20250617-135706-marostegui.json
  • 13:57 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs[7001-7002].magru.wmnet} and A:liberica
  • 13:56 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@90a716a]: T365813
  • 13:56 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs[7001-7002].magru.wmnet} and A:liberica
  • 13:56 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs[7001-7002].magru.wmnet} and A:liberica (T397053)
  • 13:55 jiji@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker-exp2001.codfw.wmnet on all recursors
  • 13:55 jiji@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker-exp2001.codfw.wmnet on all recursors
  • 13:55 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:55 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wikikube-worker-exp2001.codfw.wmnet - jiji@cumin1002"
  • 13:55 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wikikube-worker-exp2001.codfw.wmnet - jiji@cumin1002"
  • 13:54 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Adding wikikube-worker-exp1001 - jiji@cumin1002 - T397051"
  • 13:54 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Adding wikikube-worker-exp1001 - jiji@cumin1002 - T397051"
  • 13:53 tgr@deploy1003: Finished scap sync-world: Backport for Revert "Use GetSecurityLogContext hook for goodpass/badpass logging" (T395204) (duration: 10m 24s)
  • 13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P78190 and previous config saved to /var/cache/conftool/dbconfig/20250617-135211-ladsgroup.json
  • 13:50 tgr: broke login for ~30 min by deploying the wrong patch (T395204)
  • 13:50 jiji@cumin1002: START - Cookbook sre.dns.netbox
  • 13:50 jiji@cumin1002: START - Cookbook sre.ganeti.makevm for new host wikikube-worker-exp2001.codfw.wmnet
  • 13:49 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum7002.magru.wmnet with OS bookworm
  • 13:46 tgr@deploy1003: tgr: Continuing with sync
  • 13:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:46 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
  • 13:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:45 tgr@deploy1003: tgr: Backport for Revert "Use GetSecurityLogContext hook for goodpass/badpass logging" (T395204) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T396130)', diff saved to https://phabricator.wikimedia.org/P78189 and previous config saved to /var/cache/conftool/dbconfig/20250617-134432-marostegui.json
  • 13:44 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T396130)', diff saved to https://phabricator.wikimedia.org/P78188 and previous config saved to /var/cache/conftool/dbconfig/20250617-134409-marostegui.json
  • 13:43 tgr@deploy1003: Started scap sync-world: Backport for Revert "Use GetSecurityLogContext hook for goodpass/badpass logging" (T395204)
  • 13:42 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6001.drmrs.wmnet with OS bookworm
  • 13:38 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
  • 13:38 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet
  • 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P78187 and previous config saved to /var/cache/conftool/dbconfig/20250617-133704-ladsgroup.json
  • 13:35 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
  • 13:35 tgr@deploy1003: Finished scap sync-world: Backport for Fix GetSecurityLogContext hook declaration (T395204) (duration: 11m 47s)
  • 13:33 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting A:liberica-canary (T397053)
  • 13:33 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling A:liberica-canary
  • 13:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet
  • 13:32 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
  • 13:32 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling A:liberica-canary
  • 13:32 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling A:liberica-canary
  • 13:32 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting A:liberica-canary (T397053)
  • 13:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P78186 and previous config saved to /var/cache/conftool/dbconfig/20250617-132902-marostegui.json
  • 13:28 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting A:liberica-canary (T397053)
  • 13:28 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling A:liberica-canary
  • 13:28 tgr@deploy1003: tgr: Continuing with sync
  • 13:27 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:27 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
  • 13:27 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling A:liberica-canary
  • 13:27 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling A:liberica-canary
  • 13:27 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting A:liberica-canary (T397053)
  • 13:27 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:25 tgr@deploy1003: tgr: Backport for Fix GetSecurityLogContext hook declaration (T395204) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:23 tgr@deploy1003: Started scap sync-world: Backport for Fix GetSecurityLogContext hook declaration (T395204)
  • 13:23 jiji@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host wikikube-worker-exp1001.eqiad.wmnet
  • 13:22 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker-exp1001.eqiad.wmnet with OS bookworm
  • 13:22 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T382778)', diff saved to https://phabricator.wikimedia.org/P78185 and previous config saved to /var/cache/conftool/dbconfig/20250617-132157-ladsgroup.json
  • 13:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting A:liberica-canary (T397053)
  • 13:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling A:liberica-canary
  • 13:19 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
  • 13:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling A:liberica-canary
  • 13:18 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling A:liberica-canary
  • 13:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 22:00:00 on 10 hosts with reason: Maintenance
  • 13:18 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting A:liberica-canary (T397053)
  • 13:18 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 13:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T382778)', diff saved to https://phabricator.wikimedia.org/P78184 and previous config saved to /var/cache/conftool/dbconfig/20250617-131824-ladsgroup.json
  • 13:18 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 13:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T382778)', diff saved to https://phabricator.wikimedia.org/P78183 and previous config saved to /var/cache/conftool/dbconfig/20250617-131803-ladsgroup.json
  • 13:17 tgr@deploy1003: Sync cancelled.
  • 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P78182 and previous config saved to /var/cache/conftool/dbconfig/20250617-131354-marostegui.json
  • 13:12 tgr@deploy1003: tgr: Backport for Use GetSecurityLogContext hook for goodpass/badpass logging (T395204) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:10 tgr@deploy1003: Started scap sync-world: Backport for Use GetSecurityLogContext hook for goodpass/badpass logging (T395204)
  • 13:10 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on A:magru and not P{cp7002*} and A:cp - 9.2.10 upgrade (T390912)
  • 13:06 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker-exp1001.eqiad.wmnet with reason: host reimage
  • 13:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P78180 and previous config saved to /var/cache/conftool/dbconfig/20250617-130256-ladsgroup.json
  • 13:02 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker-exp1001.eqiad.wmnet with reason: host reimage
  • 12:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T396130)', diff saved to https://phabricator.wikimedia.org/P78179 and previous config saved to /var/cache/conftool/dbconfig/20250617-125847-marostegui.json
  • 12:56 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum5002.eqsin.wmnet with OS bookworm
  • 12:56 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bookworm
  • 12:50 taavi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:50 taavi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add eqiad1 auth v6 VIPs - taavi@cumin1003"
  • 12:50 taavi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add eqiad1 auth v6 VIPs - taavi@cumin1003"
  • 12:50 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker-exp1001.eqiad.wmnet with OS bookworm
  • 12:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P78178 and previous config saved to /var/cache/conftool/dbconfig/20250617-124748-ladsgroup.json
  • 12:47 taavi@cumin1003: START - Cookbook sre.dns.netbox
  • 12:35 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wikikube-worker-exp1001.eqiad.wmnet - jiji@cumin1002"
  • 12:35 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wikikube-worker-exp1001.eqiad.wmnet - jiji@cumin1002"
  • 12:34 jiji@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker-exp1001.eqiad.wmnet on all recursors
  • 12:34 jiji@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker-exp1001.eqiad.wmnet on all recursors
  • 12:34 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:34 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wikikube-worker-exp1001.eqiad.wmnet - jiji@cumin1002"
  • 12:34 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wikikube-worker-exp1001.eqiad.wmnet - jiji@cumin1002"
  • 12:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T396130)', diff saved to https://phabricator.wikimedia.org/P78177 and previous config saved to /var/cache/conftool/dbconfig/20250617-123334-marostegui.json
  • 12:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 12:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T396130)', diff saved to https://phabricator.wikimedia.org/P78176 and previous config saved to /var/cache/conftool/dbconfig/20250617-123312-marostegui.json
  • 12:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T382778)', diff saved to https://phabricator.wikimedia.org/P78175 and previous config saved to /var/cache/conftool/dbconfig/20250617-123241-ladsgroup.json
  • 12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T382778)', diff saved to https://phabricator.wikimedia.org/P78174 and previous config saved to /var/cache/conftool/dbconfig/20250617-122942-ladsgroup.json
  • 12:29 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T382778)', diff saved to https://phabricator.wikimedia.org/P78173 and previous config saved to /var/cache/conftool/dbconfig/20250617-122919-ladsgroup.json
  • 12:28 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
  • 12:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2006.codfw.wmnet with OS bullseye
  • 12:26 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
  • 12:25 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
  • 12:25 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
  • 12:24 samtar@deploy1003: Finished scap sync-world: Backport for IS: Enable `wgTemplateDataEnableDiscovery` for mediawikiwiki (T377975) (duration: 10m 42s)
  • 12:21 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
  • 12:20 jmm@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2023.codfw.wmnet with reason: remove for decom
  • 12:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P78172 and previous config saved to /var/cache/conftool/dbconfig/20250617-121805-marostegui.json
  • 12:18 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 12:17 samtar@deploy1003: samtar: Continuing with sync
  • 12:16 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 12:16 samtar@deploy1003: samtar: Backport for IS: Enable `wgTemplateDataEnableDiscovery` for mediawikiwiki (T377975) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P78171 and previous config saved to /var/cache/conftool/dbconfig/20250617-121412-ladsgroup.json
  • 12:14 samtar@deploy1003: Started scap sync-world: Backport for IS: Enable `wgTemplateDataEnableDiscovery` for mediawikiwiki (T377975)
  • 12:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2006.codfw.wmnet with reason: host reimage
  • 12:09 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 12:08 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2006.codfw.wmnet with reason: host reimage
  • 12:06 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 12:06 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 12:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P78170 and previous config saved to /var/cache/conftool/dbconfig/20250617-120257-marostegui.json
  • 11:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P78169 and previous config saved to /var/cache/conftool/dbconfig/20250617-115905-ladsgroup.json
  • 11:56 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 11:48 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2006.codfw.wmnet with OS bullseye
  • 11:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T396130)', diff saved to https://phabricator.wikimedia.org/P78168 and previous config saved to /var/cache/conftool/dbconfig/20250617-114750-marostegui.json
  • 11:47 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 11:46 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 11:46 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 11:46 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 11:45 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 11:45 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T382778)', diff saved to https://phabricator.wikimedia.org/P78167 and previous config saved to /var/cache/conftool/dbconfig/20250617-114357-ladsgroup.json
  • 11:41 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 11:41 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 11:40 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T382778)', diff saved to https://phabricator.wikimedia.org/P78166 and previous config saved to /var/cache/conftool/dbconfig/20250617-114037-ladsgroup.json
  • 11:40 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 11:40 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T382778)', diff saved to https://phabricator.wikimedia.org/P78165 and previous config saved to /var/cache/conftool/dbconfig/20250617-113915-ladsgroup.json
  • 11:38 jiji@cumin1002: START - Cookbook sre.dns.netbox
  • 11:38 jiji@cumin1002: START - Cookbook sre.ganeti.makevm for new host wikikube-worker-exp1001.eqiad.wmnet
  • 11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P78164 and previous config saved to /var/cache/conftool/dbconfig/20250617-112408-ladsgroup.json
  • 11:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T396130)', diff saved to https://phabricator.wikimedia.org/P78163 and previous config saved to /var/cache/conftool/dbconfig/20250617-112222-marostegui.json
  • 11:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 11:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T396130)', diff saved to https://phabricator.wikimedia.org/P78162 and previous config saved to /var/cache/conftool/dbconfig/20250617-112200-marostegui.json
  • 11:20 samtar@deploy1003: Finished scap sync-world: Backport for InitialiseSettings: wgTemplateDataEnableDiscovery on more wikis (T377975) (duration: 11m 36s)
  • 11:13 samtar@deploy1003: samwilson, samtar: Continuing with sync
  • 11:10 samtar@deploy1003: samwilson, samtar: Backport for InitialiseSettings: wgTemplateDataEnableDiscovery on more wikis (T377975) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P78161 and previous config saved to /var/cache/conftool/dbconfig/20250617-110900-ladsgroup.json
  • 11:08 samtar@deploy1003: Started scap sync-world: Backport for InitialiseSettings: wgTemplateDataEnableDiscovery on more wikis (T377975)
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P78160 and previous config saved to /var/cache/conftool/dbconfig/20250617-110652-marostegui.json
  • 11:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-canary (T397053)
  • 11:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling A:liberica-canary
  • 11:00 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
  • 11:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling A:liberica-canary
  • 11:00 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling A:liberica-canary
  • 11:00 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-canary (T397053)
  • 11:00 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet
  • 10:59 vgutierrez: upload liberica 0.20 to apt.wm.o (bookworm-wikimedia) - T397053
  • 10:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T382778)', diff saved to https://phabricator.wikimedia.org/P78159 and previous config saved to /var/cache/conftool/dbconfig/20250617-105353-ladsgroup.json
  • 10:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P78158 and previous config saved to /var/cache/conftool/dbconfig/20250617-105145-marostegui.json
  • 10:51 hnowlan: migrate transform APIs for wikitext<->html out of restbase
  • 10:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T382778)', diff saved to https://phabricator.wikimedia.org/P78157 and previous config saved to /var/cache/conftool/dbconfig/20250617-105028-ladsgroup.json
  • 10:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T396130)', diff saved to https://phabricator.wikimedia.org/P78156 and previous config saved to /var/cache/conftool/dbconfig/20250617-103638-marostegui.json
  • 10:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[1217,1250].eqiad.wmnet with reason: Maintenance
  • 10:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[1217,1250].eqiad.wmnet with reason: Maintenance
  • 10:30 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1018.eqiad.wmnet with reason: Upgrading clouddbs T394372
  • 10:29 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet
  • 10:28 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet
  • 10:26 marostegui@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts db1250.eqiad.wmnet
  • 10:26 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host db1250.eqiad.wmnet
  • 10:20 urbanecm@deploy1003: Finished scap sync-world: Backport for fix: Gauge metrics use `::set` not `::observe` (T397135), fix: Gauge metrics use `::set` not `::observe` (T397135) (duration: 20m 12s)
  • 10:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting A:liberica-canary (T397053)
  • 10:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling A:liberica-canary
  • 10:19 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
  • 10:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling A:liberica-canary
  • 10:19 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling A:liberica-canary
  • 10:19 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting A:liberica-canary (T397053)
  • 10:17 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 10:16 marostegui@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1250.eqiad.wmnet
  • 10:16 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 10:15 marostegui@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1250.eqiad.wmnet
  • 10:15 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 10:15 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 10:15 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 10:14 marostegui@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1250.eqiad.wmnet
  • 10:14 marostegui@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1250.eqiad.wmnet
  • 10:14 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 10:13 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:13 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:13 urbanecm@deploy1003: urbanecm: Continuing with sync
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T396130)', diff saved to https://phabricator.wikimedia.org/P78153 and previous config saved to /var/cache/conftool/dbconfig/20250617-101123-marostegui.json
  • 10:11 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet
  • 10:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T396130)', diff saved to https://phabricator.wikimedia.org/P78152 and previous config saved to /var/cache/conftool/dbconfig/20250617-101100-marostegui.json
  • 10:10 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1014.eqiad.wmnet with reason: Upgrading clouddbs T394372
  • 10:08 marostegui@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1250.eqiad.wmnet
  • 10:07 marostegui@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1250.eqiad.wmnet
  • 10:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1250.eqiad.wmnet with reason: Maintenance
  • 10:03 marostegui: Failover m1 from db1250 to db1207 - T396706
  • 10:02 urbanecm@deploy1003: urbanecm: Backport for fix: Gauge metrics use `::set` not `::observe` (T397135), fix: Gauge metrics use `::set` not `::observe` (T397135) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:00 urbanecm@deploy1003: Started scap sync-world: Backport for fix: Gauge metrics use `::set` not `::observe` (T397135), fix: Gauge metrics use `::set` not `::observe` (T397135)
  • 09:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T395241)', diff saved to https://phabricator.wikimedia.org/P78151 and previous config saved to /var/cache/conftool/dbconfig/20250617-095927-fceratto.json
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P78150 and previous config saved to /var/cache/conftool/dbconfig/20250617-095553-marostegui.json
  • 09:53 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet
  • 09:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2232].codfw.wmnet,db[1207,1217,1250].eqiad.wmnet with reason: Primary switchover m1 T396706
  • 09:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P78149 and previous config saved to /var/cache/conftool/dbconfig/20250617-094419-fceratto.json
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P78148 and previous config saved to /var/cache/conftool/dbconfig/20250617-094045-marostegui.json
  • 09:30 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1017.eqiad.wmnet with reason: Upgrading clouddbs T394372
  • 09:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P78147 and previous config saved to /var/cache/conftool/dbconfig/20250617-092912-fceratto.json
  • 09:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T396130)', diff saved to https://phabricator.wikimedia.org/P78146 and previous config saved to /var/cache/conftool/dbconfig/20250617-092538-marostegui.json
  • 09:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T395241)', diff saved to https://phabricator.wikimedia.org/P78145 and previous config saved to /var/cache/conftool/dbconfig/20250617-091404-fceratto.json
  • 09:07 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2007.codfw.wmnet with OS bullseye
  • 09:07 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
  • 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
  • 09:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T395241)', diff saved to https://phabricator.wikimedia.org/P78144 and previous config saved to /var/cache/conftool/dbconfig/20250617-090551-fceratto.json
  • 09:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T396130)', diff saved to https://phabricator.wikimedia.org/P78143 and previous config saved to /var/cache/conftool/dbconfig/20250617-090028-marostegui.json
  • 09:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T396130)', diff saved to https://phabricator.wikimedia.org/P78142 and previous config saved to /var/cache/conftool/dbconfig/20250617-090006-marostegui.json
  • 08:53 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ncredir7002.magru.wmnet
  • 08:53 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:53 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir7002.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 08:52 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2004.codfw.wmnet with OS bookworm
  • 08:51 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir7002.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 08:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2007.codfw.wmnet with reason: host reimage
  • 08:45 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P78141 and previous config saved to /var/cache/conftool/dbconfig/20250617-084458-marostegui.json
  • 08:42 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2007.codfw.wmnet with reason: host reimage
  • 08:40 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts ncredir7002.magru.wmnet
  • 08:33 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
  • 08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P78140 and previous config saved to /var/cache/conftool/dbconfig/20250617-082951-marostegui.json
  • 08:27 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
  • 08:21 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2007.codfw.wmnet with OS bullseye
  • 08:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T396130)', diff saved to https://phabricator.wikimedia.org/P78139 and previous config saved to /var/cache/conftool/dbconfig/20250617-081443-marostegui.json
  • 08:09 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner2004.codfw.wmnet with OS bookworm
  • 08:07 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2003.codfw.wmnet with OS bookworm
  • 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for feat(LevelingUp): Measure the delay between actual and intended notification timestamp (T395260) (duration: 10m 22s)
  • 08:04 moritzm: installing mariadb security updates (as shipped in Debian, not the wmf-mariadb packages we use for the main mariadb clusters)
  • 07:57 urbanecm@deploy1003: urbanecm: Continuing with sync
  • 07:56 urbanecm@deploy1003: urbanecm: Backport for feat(LevelingUp): Measure the delay between actual and intended notification timestamp (T395260) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:54 urbanecm@deploy1003: Started scap sync-world: Backport for feat(LevelingUp): Measure the delay between actual and intended notification timestamp (T395260)
  • 07:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T396130)', diff saved to https://phabricator.wikimedia.org/P78138 and previous config saved to /var/cache/conftool/dbconfig/20250617-074920-marostegui.json
  • 07:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 07:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T396130)', diff saved to https://phabricator.wikimedia.org/P78137 and previous config saved to /var/cache/conftool/dbconfig/20250617-074857-marostegui.json
  • 07:48 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2003.codfw.wmnet with reason: host reimage
  • 07:44 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2003.codfw.wmnet with reason: host reimage
  • 07:40 tchanders@deploy1003: Finished scap sync-world: Backport for temp accounts: Enable temp account creation on three wikis (T396464) (duration: 17m 32s)
  • 07:34 moritzm: installing python3.11 security updates
  • 07:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P78136 and previous config saved to /var/cache/conftool/dbconfig/20250617-073350-marostegui.json
  • 07:33 tchanders@deploy1003: tchanders: Continuing with sync
  • 07:26 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner2003.codfw.wmnet with OS bookworm
  • 07:25 tchanders@deploy1003: tchanders: Backport for temp accounts: Enable temp account creation on three wikis (T396464) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:22 tchanders@deploy1003: Started scap sync-world: Backport for temp accounts: Enable temp account creation on three wikis (T396464)
  • 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78135 and previous config saved to /var/cache/conftool/dbconfig/20250617-072127-root.json
  • 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P78134 and previous config saved to /var/cache/conftool/dbconfig/20250617-071842-marostegui.json
  • 07:18 kartik@deploy1003: Finished scap sync-world: Backport for Enable the Contribute menu (6th group) (T380930) (duration: 14m 49s)
  • 07:12 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2003.codfw.wmnet to plain
  • 07:11 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 07:11 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2003.codfw.wmnet to plain
  • 07:11 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 07:10 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 07:09 kartik@deploy1003: kartik: Continuing with sync
  • 07:08 kartik@deploy1003: kartik: Backport for Enable the Contribute menu (6th group) (T380930) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78133 and previous config saved to /var/cache/conftool/dbconfig/20250617-070621-root.json
  • 07:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T396130)', diff saved to https://phabricator.wikimedia.org/P78132 and previous config saved to /var/cache/conftool/dbconfig/20250617-070334-marostegui.json
  • 07:03 kartik@deploy1003: Started scap sync-world: Backport for Enable the Contribute menu (6th group) (T380930)
  • 07:02 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2003.codfw.wmnet to drbd
  • 07:00 jmm@puppetserver1001: conftool action : set/pooled=no; selector: name=ncredir7002.magru.wmnet
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78131 and previous config saved to /var/cache/conftool/dbconfig/20250617-065914-root.json
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78130 and previous config saved to /var/cache/conftool/dbconfig/20250617-065610-root.json
  • 06:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78129 and previous config saved to /var/cache/conftool/dbconfig/20250617-065115-root.json
  • 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78128 and previous config saved to /var/cache/conftool/dbconfig/20250617-064408-root.json
  • 06:44 jmm@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir7004.magru.wmnet
  • 06:43 jmm@puppetserver1001: conftool action : set/weight=1; selector: name=ncredir7004.magru.wmnet
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78127 and previous config saved to /var/cache/conftool/dbconfig/20250617-064104-root.json
  • 06:39 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2003.codfw.wmnet to drbd
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T396130)', diff saved to https://phabricator.wikimedia.org/P78126 and previous config saved to /var/cache/conftool/dbconfig/20250617-063803-marostegui.json
  • 06:37 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T396130)', diff saved to https://phabricator.wikimedia.org/P78125 and previous config saved to /var/cache/conftool/dbconfig/20250617-063740-marostegui.json
  • 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78124 and previous config saved to /var/cache/conftool/dbconfig/20250617-063610-root.json
  • 06:34 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 06:30 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78123 and previous config saved to /var/cache/conftool/dbconfig/20250617-062902-root.json
  • 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78122 and previous config saved to /var/cache/conftool/dbconfig/20250617-062558-root.json
  • 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P78121 and previous config saved to /var/cache/conftool/dbconfig/20250617-062233-marostegui.json
  • 06:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P78120 and previous config saved to /var/cache/conftool/dbconfig/20250617-062104-root.json
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78119 and previous config saved to /var/cache/conftool/dbconfig/20250617-061635-root.json
  • 06:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1030.eqiad.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1030', diff saved to https://phabricator.wikimedia.org/P78118 and previous config saved to /var/cache/conftool/dbconfig/20250617-061441-marostegui.json
  • 06:13 marostegui@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78117 and previous config saved to /var/cache/conftool/dbconfig/20250617-061356-root.json
  • 06:10 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78116 and previous config saved to /var/cache/conftool/dbconfig/20250617-061052-root.json
  • 06:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P78115 and previous config saved to /var/cache/conftool/dbconfig/20250617-060725-marostegui.json
  • 06:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1029.eqiad.wmnet with reason: Maintenance
  • 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1029', diff saved to https://phabricator.wikimedia.org/P78114 and previous config saved to /var/cache/conftool/dbconfig/20250617-060640-marostegui.json
  • 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78113 and previous config saved to /var/cache/conftool/dbconfig/20250617-060444-root.json
  • 06:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1028', diff saved to https://phabricator.wikimedia.org/P78112 and previous config saved to /var/cache/conftool/dbconfig/20250617-060347-marostegui.json
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78111 and previous config saved to /var/cache/conftool/dbconfig/20250617-060129-root.json
  • 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T396130)', diff saved to https://phabricator.wikimedia.org/P78110 and previous config saved to /var/cache/conftool/dbconfig/20250617-055218-marostegui.json
  • 05:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78109 and previous config saved to /var/cache/conftool/dbconfig/20250617-054938-root.json
  • 05:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78108 and previous config saved to /var/cache/conftool/dbconfig/20250617-054922-root.json
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P78107 and previous config saved to /var/cache/conftool/dbconfig/20250617-054623-root.json
  • 05:34 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78106 and previous config saved to /var/cache/conftool/dbconfig/20250617-053433-root.json
  • 05:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78105 and previous config saved to /var/cache/conftool/dbconfig/20250617-053416-root.json
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P78104 and previous config saved to /var/cache/conftool/dbconfig/20250617-053117-root.json
  • 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T396130)', diff saved to https://phabricator.wikimedia.org/P78103 and previous config saved to /var/cache/conftool/dbconfig/20250617-052639-marostegui.json
  • 05:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78102 and previous config saved to /var/cache/conftool/dbconfig/20250617-051927-root.json
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78101 and previous config saved to /var/cache/conftool/dbconfig/20250617-051911-root.json
  • 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78100 and previous config saved to /var/cache/conftool/dbconfig/20250617-051612-root.json
  • 05:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2029.codfw.wmnet with reason: Maintenance
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2029', diff saved to https://phabricator.wikimedia.org/P78099 and previous config saved to /var/cache/conftool/dbconfig/20250617-051231-marostegui.json
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2027 to es3 codfw master', diff saved to https://phabricator.wikimedia.org/P78098 and previous config saved to /var/cache/conftool/dbconfig/20250617-051212-root.json
  • 05:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1027.eqiad.wmnet with reason: Maintenance
  • 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1027', diff saved to https://phabricator.wikimedia.org/P78097 and previous config saved to /var/cache/conftool/dbconfig/20250617-050618-marostegui.json
  • 05:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 05:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78096 and previous config saved to /var/cache/conftool/dbconfig/20250617-050405-root.json
  • 04:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 04:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1197', diff saved to https://phabricator.wikimedia.org/P78095 and previous config saved to /var/cache/conftool/dbconfig/20250617-045351-marostegui.json
  • 04:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 04:47 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.6 refs T392176 (duration: 104m 43s)
  • 03:55 ejegg: donorwiki upgraded from 8bcc8ff2 to 634580e5
  • 03:21 ejegg: civicrm upgraded from 85ee8461 to ec2cd980
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.6 refs T392176
  • 02:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2005.codfw.wmnet with OS bookworm
  • 02:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 02:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:21 eileen: config revision changed from cf6f679f to 0473ff4d
  • 02:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2005.codfw.wmnet with OS bookworm
  • 02:09 eileen: civicrm upgraded from 9e8392c2 to 85ee8461
  • 01:56 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp70[02-16].magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 01:11 eileen: config revision changed from 9a6fc414 to cf6f679f
  • 01:00 eileen: revision changed from 433654ca to 022307e7
  • 00:56 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-eqsin and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 00:51 eileen: config revision changed from 6b749d90 to 9a6fc414
  • 00:40 eileen: config revision changed from eb26458f to 6b749d90
  • 00:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 00:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest2005']
  • 00:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest2005']
  • 00:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:15 eileen: config revision changed from cb49d01f to eb26458f

2025-06-16

  • 23:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:47 eileen: config revision changed from f4740838 to cb49d01f
  • 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:31 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2006.codfw.wmnet with OS bullseye
  • 23:24 eileen: config revision changed from f186d239 to f4740838
  • 23:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1185.eqiad.wmnet with reason: host reimage
  • 23:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1185.eqiad.wmnet with reason: host reimage
  • 23:09 eileen: config revision changed from 62010560 to f186d239
  • 22:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 22:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 22:35 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 22:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:15 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
  • 22:15 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
  • 22:15 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1185
  • 22:14 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1185
  • 22:05 bvibber@deploy1003: Finished scap sync-world: Backport for Quiet test rollout of Lua transforms for Charts (T388616) (duration: 10m 22s)
  • 21:57 bvibber@deploy1003: bvibber: Continuing with sync
  • 21:56 bvibber@deploy1003: bvibber: Backport for Quiet test rollout of Lua transforms for Charts (T388616) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:54 bvibber@deploy1003: Started scap sync-world: Backport for Quiet test rollout of Lua transforms for Charts (T388616)
  • 21:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2006.codfw.wmnet with OS bullseye
  • 21:45 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:45 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:38 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:32 sbassett: Deployed security fix for T396946
  • 21:20 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:09 arlolra@deploy1003: Finished scap sync-world: Backport for Turn off glent m1 AB test (T262612) (duration: 09m 53s)
  • 21:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2006-dev.codfw.wmnet with OS bullseye
  • 21:05 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:05 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:04 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:03 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:03 eileen: config revision changed from 3d490c58 to 62010560
  • 21:03 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
  • 21:03 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
  • 21:02 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1185
  • 21:02 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1185
  • 21:02 arlolra@deploy1003: ebernhardson, arlolra: Continuing with sync
  • 21:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:01 arlolra@deploy1003: ebernhardson, arlolra: Backport for Turn off glent m1 AB test (T262612) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:00 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:00 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for an-worker1185/1186 - jclark@cumin1002"
  • 21:00 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for an-worker1185/1186 - jclark@cumin1002"
  • 20:59 arlolra@deploy1003: Started scap sync-world: Backport for Turn off glent m1 AB test (T262612)
  • 20:57 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 20:56 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:54 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 20:54 arlolra@deploy1003: Finished scap sync-world: Backport for Add arbcom group to ukwiki (T396668) (duration: 10m 36s)
  • 20:49 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:48 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2006-dev.codfw.wmnet with reason: host reimage
  • 20:48 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:48 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:47 arlolra@deploy1003: arlolra, eggroll97: Continuing with sync
  • 20:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:45 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2006-dev.codfw.wmnet with reason: host reimage
  • 20:45 arlolra@deploy1003: arlolra, eggroll97: Backport for Add arbcom group to ukwiki (T396668) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:45 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:43 arlolra@deploy1003: Started scap sync-world: Backport for Add arbcom group to ukwiki (T396668)
  • 20:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T395241)', diff saved to https://phabricator.wikimedia.org/P78092 and previous config saved to /var/cache/conftool/dbconfig/20250616-203526-fceratto.json
  • 20:26 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon2006-dev.codfw.wmnet with OS bullseye
  • 20:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P78091 and previous config saved to /var/cache/conftool/dbconfig/20250616-202019-fceratto.json
  • 20:19 arlolra@deploy1003: Finished scap sync-world: Backport for Disable VipsScaler in group2 (T290759) (duration: 09m 53s)
  • 20:17 ryankemper: T395855 Stopped opensearch units on `cirrussearch205[7,8]` (row B decom hosts)
  • 20:13 arlolra@deploy1003: arlolra: Continuing with sync
  • 20:12 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5001.eqsin.wmnet with OS bookworm
  • 20:12 arlolra@deploy1003: arlolra: Backport for Disable VipsScaler in group2 (T290759) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:10 arlolra@deploy1003: Started scap sync-world: Backport for Disable VipsScaler in group2 (T290759)
  • 20:09 brett: restarting pybal on lvs1017
  • 20:09 brett: restarting pybal on lvs1020
  • 20:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P78090 and previous config saved to /var/cache/conftool/dbconfig/20250616-200512-fceratto.json
  • 19:57 eileen: civicrm upgraded from 74bffcc4 to 9e8392c2
  • 19:52 bvibber@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 19:52 bvibber@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 19:52 bvibber@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 19:51 bvibber@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 19:51 bvibber@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 19:50 bvibber@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 19:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp70[02-16].magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 19:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T395241)', diff saved to https://phabricator.wikimedia.org/P78089 and previous config saved to /var/cache/conftool/dbconfig/20250616-195004-fceratto.json
  • 19:49 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5001.eqsin.wmnet with reason: host reimage
  • 19:45 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5001.eqsin.wmnet with reason: host reimage
  • 19:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2222 (T395241)', diff saved to https://phabricator.wikimedia.org/P78088 and previous config saved to /var/cache/conftool/dbconfig/20250616-194140-fceratto.json
  • 19:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2222.codfw.wmnet with reason: Maintenance
  • 19:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T395241)', diff saved to https://phabricator.wikimedia.org/P78087 and previous config saved to /var/cache/conftool/dbconfig/20250616-194123-fceratto.json
  • 19:34 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4002.ulsfo.wmnet with OS bookworm
  • 19:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P78086 and previous config saved to /var/cache/conftool/dbconfig/20250616-192615-fceratto.json
  • 19:16 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
  • 19:15 sukhe@dns1004: END - running authdns-update
  • 19:14 sukhe@dns1004: START - running authdns-update
  • 19:13 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
  • 19:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P78085 and previous config saved to /var/cache/conftool/dbconfig/20250616-191108-fceratto.json
  • 19:02 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum5001.eqsin.wmnet with OS bookworm
  • 18:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T395241)', diff saved to https://phabricator.wikimedia.org/P78084 and previous config saved to /var/cache/conftool/dbconfig/20250616-185600-fceratto.json
  • 18:53 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum4002.ulsfo.wmnet with OS bookworm
  • 18:50 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4001.ulsfo.wmnet with OS bookworm
  • 18:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2221 (T395241)', diff saved to https://phabricator.wikimedia.org/P78083 and previous config saved to /var/cache/conftool/dbconfig/20250616-184731-fceratto.json
  • 18:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2221.codfw.wmnet with reason: Maintenance
  • 18:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T395241)', diff saved to https://phabricator.wikimedia.org/P78082 and previous config saved to /var/cache/conftool/dbconfig/20250616-184704-fceratto.json
  • 18:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:32 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 18:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P78081 and previous config saved to /var/cache/conftool/dbconfig/20250616-183156-fceratto.json
  • 18:29 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum2002.codfw.wmnet with OS bookworm
  • 18:28 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 18:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P78080 and previous config saved to /var/cache/conftool/dbconfig/20250616-181649-fceratto.json
  • 18:13 urandom: bootstrapping sessionstore2004-a/Cassandra — T390514
  • 18:12 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 18:08 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 18:08 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum4001.ulsfo.wmnet with OS bookworm
  • 18:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2005-dev.codfw.wmnet with OS bullseye
  • 18:04 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 6 hosts with reason: begin decom/remove hosts from cluster
  • 18:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T395241)', diff saved to https://phabricator.wikimedia.org/P78079 and previous config saved to /var/cache/conftool/dbconfig/20250616-180141-fceratto.json
  • 17:53 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T395241)', diff saved to https://phabricator.wikimedia.org/P78078 and previous config saved to /var/cache/conftool/dbconfig/20250616-175317-fceratto.json
  • 17:53 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 17:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T395241)', diff saved to https://phabricator.wikimedia.org/P78077 and previous config saved to /var/cache/conftool/dbconfig/20250616-175248-fceratto.json
  • 17:52 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2004.codfw.wmnet with OS bullseye
  • 17:50 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum2002.codfw.wmnet with OS bookworm
  • 17:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P78076 and previous config saved to /var/cache/conftool/dbconfig/20250616-173741-fceratto.json
  • 17:36 sukhe: sudo cumin -b1 -s10 'A:durum' 'run-puppet-agent --enable "merging CR 1159541"'
  • 17:33 sukhe: disable puppet on A:durum to roll out CR 1159541
  • 17:28 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
  • 17:25 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum2001.codfw.wmnet with OS bookworm
  • 17:24 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
  • 17:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P78075 and previous config saved to /var/cache/conftool/dbconfig/20250616-172234-fceratto.json
  • 17:20 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1002.eqiad.wmnet with OS bookworm
  • 17:17 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 17:16 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 17:12 swfrench@deploy1003: Finished scap sync-world: Scap run to test newly enabled dse-k8s-eqiad deployment - T389786 (duration: 02m 15s)
  • 17:11 swfrench@deploy1003: Started scap sync-world: Scap run to test newly enabled dse-k8s-eqiad deployment - T389786
  • 17:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T395241)', diff saved to https://phabricator.wikimedia.org/P78074 and previous config saved to /var/cache/conftool/dbconfig/20250616-170726-fceratto.json
  • 17:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2001.codfw.wmnet with reason: host reimage
  • 17:06 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
  • 17:03 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2001.codfw.wmnet with reason: host reimage
  • 17:03 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 16:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T395241)', diff saved to https://phabricator.wikimedia.org/P78073 and previous config saved to /var/cache/conftool/dbconfig/20250616-165855-fceratto.json
  • 16:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 16:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T395241)', diff saved to https://phabricator.wikimedia.org/P78072 and previous config saved to /var/cache/conftool/dbconfig/20250616-165825-fceratto.json
  • 16:58 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 16:55 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2004.codfw.wmnet with OS bullseye
  • 16:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-eqsin and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 16:50 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=97) rolling upgrade of Varnish on A:cp-eqsin and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 16:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-eqsin and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 16:46 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
  • 16:45 eevans@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
  • 16:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P78071 and previous config saved to /var/cache/conftool/dbconfig/20250616-164317-fceratto.json
  • 16:43 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum2001.codfw.wmnet with OS bookworm
  • 16:43 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum1002.eqiad.wmnet with OS bookworm
  • 16:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P78070 and previous config saved to /var/cache/conftool/dbconfig/20250616-162810-fceratto.json
  • 16:23 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1001.eqiad.wmnet with OS bookworm
  • 16:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T395241)', diff saved to https://phabricator.wikimedia.org/P78069 and previous config saved to /var/cache/conftool/dbconfig/20250616-161303-fceratto.json
  • 16:12 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 16:12 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 16:12 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 16:11 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 16:11 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 16:11 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 16:10 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 16:09 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 16:09 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 16:09 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 16:06 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 16:03 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 16:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T395241)', diff saved to https://phabricator.wikimedia.org/P78068 and previous config saved to /var/cache/conftool/dbconfig/20250616-160220-fceratto.json
  • 16:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 16:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T395241)', diff saved to https://phabricator.wikimedia.org/P78067 and previous config saved to /var/cache/conftool/dbconfig/20250616-160203-fceratto.json
  • 15:58 dancy@deploy1003: Installation of scap version "4.175.0" completed for 2 hosts
  • 15:56 dancy@deploy1003: Installing scap version "4.175.0" for 2 host(s)
  • 15:55 jdrewniak@deploy1003: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 19s)
  • 15:53 jdrewniak@deploy1003: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 09m 21s)
  • 15:49 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
  • 15:47 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host durum1001.eqiad.wmnet with OS bookworm
  • 15:47 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet
  • 15:47 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet
  • 15:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P78066 and previous config saved to /var/cache/conftool/dbconfig/20250616-154656-fceratto.json
  • 15:41 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 15:37 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 15:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P78065 and previous config saved to /var/cache/conftool/dbconfig/20250616-153148-fceratto.json
  • 15:30 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Revert "Enable temporary accounts onboarding dialog on WMF wikis" (duration: 24m 48s)
  • 15:29 urandom: decommissioning sessionstore2004-a/Cassandra — T391544
  • 15:22 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 15:20 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1013.eqiad.wmnet with reason: Upgrading clouddbs T394372
  • 15:20 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet
  • 15:17 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon2005-dev.codfw.wmnet with OS bullseye
  • 15:17 inflatador: bking@cumin2002:~$ sudo cumin A:lvs-low-traffic 'run-puppet-agent' T387569
  • 15:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T395241)', diff saved to https://phabricator.wikimedia.org/P78064 and previous config saved to /var/cache/conftool/dbconfig/20250616-151641-fceratto.json
  • 15:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20250526/ using stat1009.eqiad.wmnet)
  • 15:09 dreamyjazz@deploy1003: dreamyjazz: Backport for Revert "Enable temporary accounts onboarding dialog on WMF wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:06 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T395241)', diff saved to https://phabricator.wikimedia.org/P78063 and previous config saved to /var/cache/conftool/dbconfig/20250616-150609-fceratto.json
  • 15:06 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 15:05 dreamyjazz@deploy1003: Started scap sync-world: Backport for Revert "Enable temporary accounts onboarding dialog on WMF wikis"
  • 15:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T395241)', diff saved to https://phabricator.wikimedia.org/P78062 and previous config saved to /var/cache/conftool/dbconfig/20250616-150541-fceratto.json
  • 15:04 mszabo@deploy1003: Finished scap sync-world: Backport for Add missing labels for email confirmation reminder preferences (T58074) (duration: 53m 29s)
  • 14:58 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs7001.magru.wmnet} and A:liberica (T397036)
  • 14:58 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs7001.magru.wmnet} and A:liberica
  • 14:58 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P{lvs7001.magru.wmnet} and A:liberica
  • 14:57 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs7001.magru.wmnet} and A:liberica
  • 14:57 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P{lvs7001.magru.wmnet} and A:liberica
  • 14:57 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs7001.magru.wmnet} and A:liberica (T397036)
  • 14:54 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs7002.magru.wmnet} and A:liberica (T397036)
  • 14:54 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs7002.magru.wmnet} and A:liberica
  • 14:53 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P{lvs7002.magru.wmnet} and A:liberica
  • 14:53 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs7002.magru.wmnet} and A:liberica
  • 14:53 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P{lvs7002.magru.wmnet} and A:liberica
  • 14:53 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs7002.magru.wmnet} and A:liberica (T397036)
  • 14:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P78060 and previous config saved to /var/cache/conftool/dbconfig/20250616-145032-fceratto.json
  • 14:50 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bullseye
  • 14:49 mszabo@deploy1003: mszabo: Continuing with sync
  • 14:48 mszabo@deploy1003: mszabo: Backport for Add missing labels for email confirmation reminder preferences (T58074) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:41 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 14:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T396130)', diff saved to https://phabricator.wikimedia.org/P78059 and previous config saved to /var/cache/conftool/dbconfig/20250616-144127-marostegui.json
  • 14:36 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7002.magru.wmnet to drbd
  • 14:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P78058 and previous config saved to /var/cache/conftool/dbconfig/20250616-143525-fceratto.json
  • 14:32 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 14:29 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs1013.eqiad.wmnet} and A:liberica (T397036)
  • 14:28 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs1013.eqiad.wmnet} and A:liberica (T397036)
  • 14:28 vgutierrez: upgrade to liberica 0.19 in lvs1013 - T397036
  • 14:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P78057 and previous config saved to /var/cache/conftool/dbconfig/20250616-142620-marostegui.json
  • 14:25 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 14:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T395241)', diff saved to https://phabricator.wikimedia.org/P78056 and previous config saved to /var/cache/conftool/dbconfig/20250616-142017-fceratto.json
  • 14:19 vgutierrez: upload liberica 0.19 to apt.wm.o (bookworm-wikimedia) - T397036
  • 14:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P78055 and previous config saved to /var/cache/conftool/dbconfig/20250616-141113-marostegui.json
  • 14:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T395241)', diff saved to https://phabricator.wikimedia.org/P78054 and previous config saved to /var/cache/conftool/dbconfig/20250616-141044-fceratto.json
  • 14:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 14:10 mszabo@deploy1003: Started scap sync-world: Backport for Add missing labels for email confirmation reminder preferences (T58074)
  • 14:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T395241)', diff saved to https://phabricator.wikimedia.org/P78053 and previous config saved to /var/cache/conftool/dbconfig/20250616-141016-fceratto.json
  • 14:09 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bullseye
  • 14:08 marostegui@cumin1002: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78052 and previous config saved to /var/cache/conftool/dbconfig/20250616-140807-root.json
  • 14:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:05 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2002.codfw.wmnet with OS bookworm
  • 14:01 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bullseye
  • 13:57 vgutierrez: use Google Trust Services (GTS) unified TLS certificate on esams - T395131
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T396130)', diff saved to https://phabricator.wikimedia.org/P78051 and previous config saved to /var/cache/conftool/dbconfig/20250616-135605-marostegui.json
  • 13:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P78050 and previous config saved to /var/cache/conftool/dbconfig/20250616-135507-fceratto.json
  • 13:54 phuedx@deploy1003: Finished scap sync-world: Backport for Try subresource JS autologin on SUL3 domain first if configured (T391284), Fix adding warnings to ParserOutput (T396768) (duration: 13m 09s)
  • 13:53 marostegui@cumin1002: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78049 and previous config saved to /var/cache/conftool/dbconfig/20250616-135301-root.json
  • 13:52 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-backup2001.codfw.wmnet with reason: Maintenance and reboot
  • 13:47 phuedx@deploy1003: phuedx, matmarex: Continuing with sync
  • 13:47 sukhe: enable puppet and run agent on cephosd1001
  • 13:45 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2002.codfw.wmnet with reason: host reimage
  • 13:43 phuedx@deploy1003: phuedx, matmarex: Backport for Try subresource JS autologin on SUL3 domain first if configured (T391284), Fix adding warnings to ParserOutput (T396768) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:42 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2002.codfw.wmnet with reason: host reimage
  • 13:41 phuedx@deploy1003: Started scap sync-world: Backport for Try subresource JS autologin on SUL3 domain first if configured (T391284), Fix adding warnings to ParserOutput (T396768)
  • 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T396130)', diff saved to https://phabricator.wikimedia.org/P78048 and previous config saved to /var/cache/conftool/dbconfig/20250616-134036-marostegui.json
  • 13:40 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T396130)', diff saved to https://phabricator.wikimedia.org/P78047 and previous config saved to /var/cache/conftool/dbconfig/20250616-134012-marostegui.json
  • 13:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P78046 and previous config saved to /var/cache/conftool/dbconfig/20250616-134000-fceratto.json
  • 13:39 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 13:38 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7002.magru.wmnet to drbd
  • 13:37 marostegui@cumin1002: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78045 and previous config saved to /var/cache/conftool/dbconfig/20250616-133755-root.json
  • 13:35 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 13:35 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7003.magru.wmnet to drbd
  • 13:33 phuedx@deploy1003: Finished scap sync-world: Backport for Revert "Change citoid config for test wiki" (duration: 14m 22s)
  • 13:32 sukhe: sudo cumin -b1 -s30 'A:wikidough' "run-puppet-agent --enable 'CR1052109'": T362392
  • 13:32 sukhe: sudo cumin -b1 -s30 'A:dnsbox' "run-puppet-agent --enable 'CR1052109'": T362392
  • 13:31 sukhe: T362392
  • 13:27 phuedx@deploy1003: mvolz, phuedx: Continuing with sync
  • 13:25 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7003.magru.wmnet to drbd
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P78044 and previous config saved to /var/cache/conftool/dbconfig/20250616-132504-marostegui.json
  • 13:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T395241)', diff saved to https://phabricator.wikimedia.org/P78043 and previous config saved to /var/cache/conftool/dbconfig/20250616-132452-fceratto.json
  • 13:24 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner2002.codfw.wmnet with OS bookworm
  • 13:23 root@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-backup2002.codfw.wmnet: Renew puppet certificate - root@cumin1002
  • 13:22 marostegui@cumin1002: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78042 and previous config saved to /var/cache/conftool/dbconfig/20250616-132250-root.json
  • 13:21 phuedx@deploy1003: mvolz, phuedx: Backport for Revert "Change citoid config for test wiki" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bullseye
  • 13:19 phuedx@deploy1003: Started scap sync-world: Backport for Revert "Change citoid config for test wiki"
  • 13:19 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephmon2004-dev.codfw.wmnet with OS bullseye
  • 13:18 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7002.wikimedia.org to drbd
  • 13:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1026.eqiad.wmnet with reason: Maintenance
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1026 T395241', diff saved to https://phabricator.wikimedia.org/P78041 and previous config saved to /var/cache/conftool/dbconfig/20250616-131646-marostegui.json
  • 13:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 13:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 13:16 phuedx@deploy1003: Sync cancelled.
  • 13:14 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T395241)', diff saved to https://phabricator.wikimedia.org/P78040 and previous config saved to /var/cache/conftool/dbconfig/20250616-131410-fceratto.json
  • 13:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 13:11 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: tested CR 1052109]
  • 13:10 phuedx@deploy1003: phuedx, mvolz, dreamyjazz, tchanders: Backport for ext-EventStreamConfig: Update product_metrics.web_base stream (T395692), Set $wgCentralAuthAutomaticGlobalGroups for global IP reveal group (T376315), Enable temporary accounts onboarding dialog on WMF wikis (T395933), Change citoid config for test wiki (T361576) synced to t
  • 13:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P78039 and previous config saved to /var/cache/conftool/dbconfig/20250616-130950-marostegui.json
  • 13:08 phuedx@deploy1003: Started scap sync-world: Backport for ext-EventStreamConfig: Update product_metrics.web_base stream (T395692), Set $wgCentralAuthAutomaticGlobalGroups for global IP reveal group (T376315), Enable temporary accounts onboarding dialog on WMF wikis (T395933), Change citoid config for test wiki (T361576)
  • 13:08 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: bird testing CR 1052109]
  • 13:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2080.codfw.wmnet with OS bullseye
  • 13:04 XioNoX: disable puppet on all hosts using the bird puppet module for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1052109
  • 12:59 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7002.wikimedia.org to drbd
  • 12:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T396130)', diff saved to https://phabricator.wikimedia.org/P78037 and previous config saved to /var/cache/conftool/dbconfig/20250616-125442-marostegui.json
  • 12:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2080.codfw.wmnet with reason: host reimage
  • 12:42 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2080.codfw.wmnet with reason: host reimage
  • 12:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T396130)', diff saved to https://phabricator.wikimedia.org/P78036 and previous config saved to /var/cache/conftool/dbconfig/20250616-124002-marostegui.json
  • 12:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T396130)', diff saved to https://phabricator.wikimedia.org/P78035 and previous config saved to /var/cache/conftool/dbconfig/20250616-123939-marostegui.json
  • 12:37 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 12:34 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7003.wikimedia.org to drbd
  • 12:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 12:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2080
  • 12:25 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2080
  • 12:25 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2080
  • 12:25 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2080.codfw.wmnet 245.48.192.10.in-addr.arpa 5.4.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:25 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2080.codfw.wmnet 245.48.192.10.in-addr.arpa 5.4.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:25 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:25 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2080 - mvernon@cumin2002"
  • 12:24 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2080 - mvernon@cumin2002"
  • 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P78034 and previous config saved to /var/cache/conftool/dbconfig/20250616-122432-marostegui.json
  • 12:18 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 12:18 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2080
  • 12:18 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bullseye
  • 12:18 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2080.codfw.wmnet with OS bullseye
  • 12:17 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7003.wikimedia.org to drbd
  • 12:15 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1004.eqiad.wmnet with OS bookworm
  • 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P78033 and previous config saved to /var/cache/conftool/dbconfig/20250616-120924-marostegui.json
  • 12:06 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7003.magru.wmnet to drbd
  • 11:57 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1004.eqiad.wmnet with reason: host reimage
  • 11:56 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7003.magru.wmnet to drbd
  • 11:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T396130)', diff saved to https://phabricator.wikimedia.org/P78031 and previous config saved to /var/cache/conftool/dbconfig/20250616-115417-marostegui.json
  • 11:54 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1004.eqiad.wmnet with reason: host reimage
  • 11:51 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti7003.magru.wmnet to cluster magru03 and group B
  • 11:50 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru03 and group B
  • 11:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1229 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78030 and previous config saved to /var/cache/conftool/dbconfig/20250616-114138-root.json
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T396130)', diff saved to https://phabricator.wikimedia.org/P78029 and previous config saved to /var/cache/conftool/dbconfig/20250616-113938-marostegui.json
  • 11:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T396130)', diff saved to https://phabricator.wikimedia.org/P78028 and previous config saved to /var/cache/conftool/dbconfig/20250616-113915-marostegui.json
  • 11:37 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner1004.eqiad.wmnet with OS bookworm
  • 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1229 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78027 and previous config saved to /var/cache/conftool/dbconfig/20250616-112633-root.json
  • 11:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P78026 and previous config saved to /var/cache/conftool/dbconfig/20250616-112408-marostegui.json
  • 11:19 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
  • 11:15 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1252* slowly with 10 steps - Pooling in
  • 11:14 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-backup2002.codfw.wmnet with reason: Maintenance and reboot
  • 11:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1229 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78023 and previous config saved to /var/cache/conftool/dbconfig/20250616-111127-root.json
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P78022 and previous config saved to /var/cache/conftool/dbconfig/20250616-110901-marostegui.json
  • 11:08 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
  • 10:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T395241)', diff saved to https://phabricator.wikimedia.org/P78020 and previous config saved to /var/cache/conftool/dbconfig/20250616-105806-fceratto.json
  • 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1229 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78019 and previous config saved to /var/cache/conftool/dbconfig/20250616-105621-root.json
  • 10:55 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:55 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:54 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T396130)', diff saved to https://phabricator.wikimedia.org/P78018 and previous config saved to /var/cache/conftool/dbconfig/20250616-105353-marostegui.json
  • 10:53 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P78016 and previous config saved to /var/cache/conftool/dbconfig/20250616-104259-fceratto.json
  • 10:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T396130)', diff saved to https://phabricator.wikimedia.org/P78015 and previous config saved to /var/cache/conftool/dbconfig/20250616-103720-marostegui.json
  • 10:37 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T396130)', diff saved to https://phabricator.wikimedia.org/P78014 and previous config saved to /var/cache/conftool/dbconfig/20250616-103657-marostegui.json
  • 10:34 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 10:34 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 10:30 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 10:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1229 T396549', diff saved to https://phabricator.wikimedia.org/P78012 and previous config saved to /var/cache/conftool/dbconfig/20250616-102949-marostegui.json
  • 10:29 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 10:29 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 10:29 claime: Manual run job.batch/update-special-pages-s8-manual-202506161028 started - T396977
  • 10:29 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 10:28 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:28 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:28 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:28 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P78011 and previous config saved to /var/cache/conftool/dbconfig/20250616-102752-fceratto.json
  • 10:26 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:25 moritzm: installing qemu security updates
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P78010 and previous config saved to /var/cache/conftool/dbconfig/20250616-102150-marostegui.json
  • 10:16 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:15 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T395241)', diff saved to https://phabricator.wikimedia.org/P78008 and previous config saved to /var/cache/conftool/dbconfig/20250616-101244-fceratto.json
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P78006 and previous config saved to /var/cache/conftool/dbconfig/20250616-100642-marostegui.json
  • 10:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T395241)', diff saved to https://phabricator.wikimedia.org/P78005 and previous config saved to /var/cache/conftool/dbconfig/20250616-100521-fceratto.json
  • 10:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove magru01 cluster - jmm@cumin2002"
  • 09:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove magru01 cluster - jmm@cumin2002"
  • 09:57 zabe@deploy1003: Finished scap sync-world: Backport for wikidatawiki: Increase revision-slots cache back to default (T183490), Stop setting $wgPageLinksSchemaMigrationStage (T299947) (duration: 12m 46s)
  • 09:54 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T396130)', diff saved to https://phabricator.wikimedia.org/P78002 and previous config saved to /var/cache/conftool/dbconfig/20250616-095135-marostegui.json
  • 09:51 zabe@deploy1003: zabe: Continuing with sync
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:47 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:47 zabe@deploy1003: zabe: Backport for wikidatawiki: Increase revision-slots cache back to default (T183490), Stop setting $wgPageLinksSchemaMigrationStage (T299947) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "remove ganeti7003 - jmm@cumin2002"
  • 09:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "remove ganeti7003 - jmm@cumin2002"
  • 09:45 zabe@deploy1003: Started scap sync-world: Backport for wikidatawiki: Increase revision-slots cache back to default (T183490), Stop setting $wgPageLinksSchemaMigrationStage (T299947)
  • 09:44 moritzm: remove magru01 in Netbox (all Ganeti nodes have been removed from it) T394263
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T396130)', diff saved to https://phabricator.wikimedia.org/P78000 and previous config saved to /var/cache/conftool/dbconfig/20250616-093451-marostegui.json
  • 09:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1254 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77999 and previous config saved to /var/cache/conftool/dbconfig/20250616-093442-root.json
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T396130)', diff saved to https://phabricator.wikimedia.org/P77998 and previous config saved to /var/cache/conftool/dbconfig/20250616-093429-marostegui.json
  • 09:31 zabe: zabe@deploy1003:~$ mwscript extensions/AbuseFilter/maintenance/MigrateESRefToAflTable.php wikidatawiki --deletedump /home/zabe/afl_text_table_deletedump/wikidatawiki --dump /home/zabe/afl_text_table_dump/wikidatawiki --sleep 0.5 # T381599
  • 09:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs7001.magru.wmnet} and A:liberica
  • 09:31 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs7001.magru.wmnet} and A:liberica
  • 09:31 vgutierrez: repool lvs7001 using katran as forwarding plane - T396561
  • 09:26 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7001.magru.wmnet
  • 09:26 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7001.magru.wmnet
  • 09:23 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7003.magru.wmnet with OS bookworm
  • 09:23 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1003.eqiad.wmnet with OS bookworm
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1254 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77996 and previous config saved to /var/cache/conftool/dbconfig/20250616-091936-root.json
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P77995 and previous config saved to /var/cache/conftool/dbconfig/20250616-091921-marostegui.json
  • 09:14 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1252* slowly with 10 steps - Pooling in
  • 09:12 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1252* slowly with 10 steps - Pooling in
  • 09:10 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1252* slowly with 10 steps - Pooling in
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77992 and previous config saved to /var/cache/conftool/dbconfig/20250616-090439-root.json
  • 09:04 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1003.eqiad.wmnet with reason: host reimage
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1254 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77991 and previous config saved to /var/cache/conftool/dbconfig/20250616-090431-root.json
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P77990 and previous config saved to /var/cache/conftool/dbconfig/20250616-090414-marostegui.json
  • 09:01 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7003.magru.wmnet with reason: host reimage
  • 09:00 fceratto@cumin1002: dbctl commit (dc=all): 'Add db1252', diff saved to https://phabricator.wikimedia.org/P77989 and previous config saved to /var/cache/conftool/dbconfig/20250616-090058-fceratto.json
  • 09:00 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1003.eqiad.wmnet with reason: host reimage
  • 08:58 vgutierrez: repool ncredir7003
  • 08:57 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7003.magru.wmnet with reason: host reimage
  • 08:54 vgutierrez: depooling ncredir7003
  • 08:51 zabe: zabe@deploy1003:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php wikidatawiki --delete /home/zabe/text_table_cleanup/wikidatawiki --sleep 0.5 # T183490
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77988 and previous config saved to /var/cache/conftool/dbconfig/20250616-084934-root.json
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1254 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77987 and previous config saved to /var/cache/conftool/dbconfig/20250616-084925-root.json
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T396130)', diff saved to https://phabricator.wikimedia.org/P77986 and previous config saved to /var/cache/conftool/dbconfig/20250616-084907-marostegui.json
  • 08:48 taavi: cr policy: rename cr-labs to cr-cloud-hosts (https://gerrit.wikimedia.org/r/c/operations/homer/public/+/1159360)
  • 08:44 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner1003.eqiad.wmnet with OS bookworm
  • 08:36 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti7003.magru.wmnet with OS bookworm
  • 08:35 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 08:35 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 08:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77985 and previous config saved to /var/cache/conftool/dbconfig/20250616-083428-root.json
  • 08:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1254 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77984 and previous config saved to /var/cache/conftool/dbconfig/20250616-083419-root.json
  • 08:31 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 08:31 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs7001.magru.wmnet with reason: switching to katran
  • 08:31 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 08:30 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 08:30 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 08:29 ladsgroup@deploy1003: Finished scap sync-world: Backport for IP cap lift for wikipedia workshop - cs.wikipedia on 19June2025 (T396980) (duration: 10m 13s)
  • 08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T396130)', diff saved to https://phabricator.wikimedia.org/P77983 and previous config saved to /var/cache/conftool/dbconfig/20250616-082933-marostegui.json
  • 08:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T396130)', diff saved to https://phabricator.wikimedia.org/P77982 and previous config saved to /var/cache/conftool/dbconfig/20250616-082910-marostegui.json
  • 08:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1159.eqiad.wmnet
  • 08:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1254.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1254 T396549', diff saved to https://phabricator.wikimedia.org/P77981 and previous config saved to /var/cache/conftool/dbconfig/20250616-082841-marostegui.json
  • 08:22 ladsgroup@deploy1003: anzx, ladsgroup: Continuing with sync
  • 08:21 ladsgroup@deploy1003: anzx, ladsgroup: Backport for IP cap lift for wikipedia workshop - cs.wikipedia on 19June2025 (T396980) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:21 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1159.eqiad.wmnet
  • 08:20 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1158.eqiad.wmnet
  • 08:20 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs7001.magru.wmnet} and A:liberica (T396561)
  • 08:19 ladsgroup@deploy1003: Started scap sync-world: Backport for IP cap lift for wikipedia workshop - cs.wikipedia on 19June2025 (T396980)
  • 08:19 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77980 and previous config saved to /var/cache/conftool/dbconfig/20250616-081922-root.json
  • 08:19 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P{lvs7001.magru.wmnet} and A:liberica (T396561)
  • 08:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for mrwiki: add मसूदा (draft) namespace (T396551) (duration: 15m 11s)
  • 08:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P77979 and previous config saved to /var/cache/conftool/dbconfig/20250616-081402-marostegui.json
  • 08:13 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1158.eqiad.wmnet
  • 08:13 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1157.eqiad.wmnet
  • 08:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 08:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 08:05 ladsgroup@deploy1003: ladsgroup, anzx: Continuing with sync
  • 08:05 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1157.eqiad.wmnet
  • 08:04 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 08:04 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 08:04 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1162.eqiad.wmnet
  • 08:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 08:03 ladsgroup@deploy1003: ladsgroup, anzx: Backport for mrwiki: add मसूदा (draft) namespace (T396551) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 08:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 07:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 07:59 ladsgroup@deploy1003: Started scap sync-world: Backport for mrwiki: add मसूदा (draft) namespace (T396551)
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P77978 and previous config saved to /var/cache/conftool/dbconfig/20250616-075855-marostegui.json
  • 07:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
  • 07:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
  • 07:56 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1162.eqiad.wmnet
  • 07:55 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1175-1176].eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 9 and 10
  • 07:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 07:55 ladsgroup@deploy1003: Finished scap sync-world: Backport for Enable sub-referencing on test wiki (T395871) (duration: 40m 51s)
  • 07:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 07:55 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1149-1153].eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 9 and 10
  • 07:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 07:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 07:53 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1161.eqiad.wmnet
  • 07:45 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1161.eqiad.wmnet
  • 07:44 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1160.eqiad.wmnet
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T396130)', diff saved to https://phabricator.wikimedia.org/P77977 and previous config saved to /var/cache/conftool/dbconfig/20250616-074346-marostegui.json
  • 07:42 ladsgroup@deploy1003: lilients, ladsgroup: Continuing with sync
  • 07:36 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20250526/ using stat1009.eqiad.wmnet)
  • 07:35 ladsgroup@deploy1003: lilients, ladsgroup: Backport for Enable sub-referencing on test wiki (T395871) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2207 T396976', diff saved to https://phabricator.wikimedia.org/P77976 and previous config saved to /var/cache/conftool/dbconfig/20250616-073045-marostegui.json
  • 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2204 to s2 primary T396976', diff saved to https://phabricator.wikimedia.org/P77975 and previous config saved to /var/cache/conftool/dbconfig/20250616-072955-root.json
  • 07:29 marostegui: Starting s2 codfw failover from db2207 to db2204 - T396976
  • 07:28 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1160.eqiad.wmnet
  • 07:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T396130)', diff saved to https://phabricator.wikimedia.org/P77974 and previous config saved to /var/cache/conftool/dbconfig/20250616-072702-marostegui.json
  • 07:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 07:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T396130)', diff saved to https://phabricator.wikimedia.org/P77973 and previous config saved to /var/cache/conftool/dbconfig/20250616-072640-marostegui.json
  • 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 07:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 07:14 ladsgroup@deploy1003: Started scap sync-world: Backport for Enable sub-referencing on test wiki (T395871)
  • 07:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P77971 and previous config saved to /var/cache/conftool/dbconfig/20250616-071132-marostegui.json
  • 07:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 T396976
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2204 with weight 0 T396976', diff saved to https://phabricator.wikimedia.org/P77970 and previous config saved to /var/cache/conftool/dbconfig/20250616-070524-root.json
  • 07:00 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install7001.wikimedia.org
  • 07:00 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:00 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install7001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 07:00 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install7001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P77969 and previous config saved to /var/cache/conftool/dbconfig/20250616-065625-marostegui.json
  • 06:55 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 06:47 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts install7001.wikimedia.org
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T396130)', diff saved to https://phabricator.wikimedia.org/P77968 and previous config saved to /var/cache/conftool/dbconfig/20250616-064117-marostegui.json
  • 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T396130)', diff saved to https://phabricator.wikimedia.org/P77967 and previous config saved to /var/cache/conftool/dbconfig/20250616-062536-marostegui.json
  • 06:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 06:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T396130)', diff saved to https://phabricator.wikimedia.org/P77966 and previous config saved to /var/cache/conftool/dbconfig/20250616-061053-marostegui.json
  • 05:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P77965 and previous config saved to /var/cache/conftool/dbconfig/20250616-055545-marostegui.json
  • 05:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 151326
  • 05:48 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 151326
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77964 and previous config saved to /var/cache/conftool/dbconfig/20250616-054656-root.json
  • 05:42 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1161.eqiad.wmnet
  • 05:42 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1161.eqiad.wmnet
  • 05:41 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1161.eqiad.wmnet
  • 05:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P77963 and previous config saved to /var/cache/conftool/dbconfig/20250616-054037-marostegui.json
  • 05:38 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1161.eqiad.wmnet
  • 05:37 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1162.eqiad.wmnet
  • 05:35 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1162.eqiad.wmnet
  • 05:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1160.eqiad.wmnet
  • 05:33 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1160.eqiad.wmnet
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77962 and previous config saved to /var/cache/conftool/dbconfig/20250616-053150-root.json
  • 05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T396130)', diff saved to https://phabricator.wikimedia.org/P77961 and previous config saved to /var/cache/conftool/dbconfig/20250616-052530-marostegui.json
  • 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77960 and previous config saved to /var/cache/conftool/dbconfig/20250616-051644-root.json
  • 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T396130)', diff saved to https://phabricator.wikimedia.org/P77959 and previous config saved to /var/cache/conftool/dbconfig/20250616-050637-marostegui.json
  • 05:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 05:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77958 and previous config saved to /var/cache/conftool/dbconfig/20250616-050139-root.json
  • 04:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 04:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2204 T396549', diff saved to https://phabricator.wikimedia.org/P77957 and previous config saved to /var/cache/conftool/dbconfig/20250616-045738-marostegui.json
  • 04:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance

2025-06-15

  • 18:09 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on doc1003.eqiad.wmnet with reason: Bookworm Migration

2025-06-14

  • 22:38 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:35 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 22:24 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:24 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: T396940 - andrew@cumin1002"
  • 22:23 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: T396940 - andrew@cumin1002"
  • 22:18 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 21:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1024.eqiad.wmnet with OS bullseye
  • 21:35 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: host reimage
  • 21:31 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: host reimage
  • 21:16 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1024.eqiad.wmnet with OS bullseye
  • 21:15 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1024.eqiad.wmnet']
  • 21:08 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1024.eqiad.wmnet']
  • 21:08 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1024.eqiad.wmnet
  • 21:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1024.eqiad.wmnet
  • 20:58 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1024.eqiad.wmnet
  • 20:46 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1024.eqiad.wmnet
  • 19:59 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1023.eqiad.wmnet with OS bullseye
  • 19:45 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: host reimage
  • 19:41 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: host reimage
  • 19:26 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1023.eqiad.wmnet with OS bullseye
  • 19:17 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1023.eqiad.wmnet']
  • 19:11 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1023.eqiad.wmnet']
  • 19:11 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1023.eqiad.wmnet
  • 19:11 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudcephosd1023.eqiad.wmnet
  • 19:03 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1023.eqiad.wmnet
  • 18:51 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1023.eqiad.wmnet
  • 13:17 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1022.eqiad.wmnet with OS bullseye
  • 13:01 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: host reimage
  • 12:56 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: host reimage
  • 12:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1022.eqiad.wmnet with OS bullseye
  • 12:39 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1022.eqiad.wmnet']
  • 12:29 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1022.eqiad.wmnet']
  • 12:26 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1022.eqiad.wmnet
  • 12:26 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1022.eqiad.wmnet
  • 12:16 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1022.eqiad.wmnet
  • 12:01 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1022.eqiad.wmnet
  • 07:39 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1157.eqiad.wmnet
  • 05:10 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1020.eqiad.wmnet with OS bullseye
  • 04:55 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: host reimage
  • 04:52 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: host reimage
  • 04:36 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1020.eqiad.wmnet with OS bullseye
  • 04:35 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1020.eqiad.wmnet']
  • 04:29 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1020.eqiad.wmnet']
  • 04:29 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1020.eqiad.wmnet
  • 04:21 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1020.eqiad.wmnet
  • 04:09 ryankemper: [WDQS] Restarted blazegraph on `wdqs2009`. Probedown already resolved before the restart so this might be necessary but restarting just in case
  • 00:09 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
  • 00:08 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye

2025-06-13

  • 23:58 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 23:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 23:31 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 23:28 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 23:27 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 23:22 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
  • 23:16 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 23:15 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
  • 22:26 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
  • 22:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 22:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:24 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:23 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
  • 22:19 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:18 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 22:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 22:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 22:12 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
  • 22:11 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 22:08 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 22:06 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:04 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 22:03 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:03 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
  • 22:03 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
  • 22:03 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:02 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 22:02 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:00 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1185
  • 22:00 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1185
  • 22:00 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 22:00 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:59 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:59 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1185 - vriley@cumin1002"
  • 21:59 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1185 - vriley@cumin1002"
  • 21:59 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 21:53 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:52 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 21:51 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 21:49 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 21:49 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 21:48 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 21:48 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 21:40 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 21:40 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 21:30 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 21:00 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1019.eqiad.wmnet with OS bullseye
  • 20:43 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: host reimage
  • 20:40 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: host reimage
  • 20:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 20:24 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1019.eqiad.wmnet with OS bullseye
  • 20:24 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 20:19 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1019.eqiad.wmnet with OS bullseye
  • 20:13 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 20:03 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: host reimage
  • 20:03 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 20:00 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: host reimage
  • 19:44 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1019.eqiad.wmnet with OS bullseye
  • 19:41 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1019.eqiad.wmnet']
  • 19:35 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1019.eqiad.wmnet']
  • 19:35 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1019.eqiad.wmnet
  • 19:35 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudcephosd1019.eqiad.wmnet
  • 19:24 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1019.eqiad.wmnet
  • 19:23 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1019.eqiad.wmnet
  • 19:21 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1019.eqiad.wmnet
  • 19:14 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1019.eqiad.wmnet
  • 17:54 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:44 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:40 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:29 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:29 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:29 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:29 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:29 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:28 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:26 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:26 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:26 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:25 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:25 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:24 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:24 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:22 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:22 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:22 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:21 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:21 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:21 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:21 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:21 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:20 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:20 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:18 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:16 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:16 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:12 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:12 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 16:49 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1157.eqiad.wmnet
  • 16:45 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1159.eqiad.wmnet
  • 16:42 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1159.eqiad.wmnet
  • 16:42 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1158.eqiad.wmnet
  • 16:40 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1158.eqiad.wmnet
  • 16:38 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1157.eqiad.wmnet
  • 16:36 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1157.eqiad.wmnet
  • 16:35 dancy@deploy1003: Installation of scap version "4.174.0" completed for 2 hosts
  • 16:33 dancy@deploy1003: Installing scap version "4.174.0" for 2 host(s)
  • 16:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:13 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:10 brennen@deploy1003: Finished scap sync-world: Backport for Revert "group1 to 1.45.0-wmf.5" (T392175 T396790) (duration: 14m 56s)
  • 16:06 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:06 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:01 brennen@deploy1003: brennen: Continuing with sync
  • 16:00 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 15:59 brennen@deploy1003: brennen: Backport for Revert "group1 to 1.45.0-wmf.5" (T392175 T396790) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 15:55 brennen@deploy1003: Started scap sync-world: Backport for Revert "group1 to 1.45.0-wmf.5" (T392175 T396790)
  • 15:55 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:54 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:50 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 15:49 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 14:54 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T396130)', diff saved to https://phabricator.wikimedia.org/P77952 and previous config saved to /var/cache/conftool/dbconfig/20250613-143859-marostegui.json
  • 14:25 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@cab8d81]: hotfix-bump SEAL to v0.9.0 (duration: 02m 26s)
  • 14:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P77951 and previous config saved to /var/cache/conftool/dbconfig/20250613-142351-marostegui.json
  • 14:23 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@cab8d81]: hotfix-bump SEAL to v0.9.0
  • 14:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P77950 and previous config saved to /var/cache/conftool/dbconfig/20250613-140844-marostegui.json
  • 13:57 damilare: SmashPig upgraded from 84c0668b to 4eef974d
  • 13:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T396130)', diff saved to https://phabricator.wikimedia.org/P77949 and previous config saved to /var/cache/conftool/dbconfig/20250613-135336-marostegui.json
  • 13:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T396130)', diff saved to https://phabricator.wikimedia.org/P77948 and previous config saved to /var/cache/conftool/dbconfig/20250613-133900-marostegui.json
  • 13:38 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T396130)', diff saved to https://phabricator.wikimedia.org/P77947 and previous config saved to /var/cache/conftool/dbconfig/20250613-133837-marostegui.json
  • 13:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P77944 and previous config saved to /var/cache/conftool/dbconfig/20250613-132329-marostegui.json
  • 13:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P77942 and previous config saved to /var/cache/conftool/dbconfig/20250613-130822-marostegui.json
  • 13:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1018.eqiad.wmnet with OS bullseye
  • 12:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T396130)', diff saved to https://phabricator.wikimedia.org/P77941 and previous config saved to /var/cache/conftool/dbconfig/20250613-125314-marostegui.json
  • 12:48 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: host reimage
  • 12:44 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: host reimage
  • 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77940 and previous config saved to /var/cache/conftool/dbconfig/20250613-123955-root.json
  • 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T396130)', diff saved to https://phabricator.wikimedia.org/P77939 and previous config saved to /var/cache/conftool/dbconfig/20250613-123635-marostegui.json
  • 12:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T396130)', diff saved to https://phabricator.wikimedia.org/P77938 and previous config saved to /var/cache/conftool/dbconfig/20250613-123612-marostegui.json
  • 12:28 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1018.eqiad.wmnet with OS bullseye
  • 12:27 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1018.eqiad.wmnet with OS bullseye
  • 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77937 and previous config saved to /var/cache/conftool/dbconfig/20250613-122449-root.json
  • 12:21 akosiaris: T390251 re-enable puppet on all registries.
  • 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P77936 and previous config saved to /var/cache/conftool/dbconfig/20250613-122104-marostegui.json
  • 12:17 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1018.eqiad.wmnet with OS bullseye
  • 12:15 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1018.eqiad.wmnet
  • 12:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1018.eqiad.wmnet
  • 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77935 and previous config saved to /var/cache/conftool/dbconfig/20250613-120944-root.json
  • 12:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P77934 and previous config saved to /var/cache/conftool/dbconfig/20250613-120557-marostegui.json
  • 12:05 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1018.eqiad.wmnet
  • 12:02 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir7004.magru.wmnet with OS bookworm
  • 11:55 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1018.eqiad.wmnet
  • 11:55 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1018.eqiad.wmnet']
  • 11:54 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1018.eqiad.wmnet']
  • 11:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77933 and previous config saved to /var/cache/conftool/dbconfig/20250613-115438-root.json
  • 11:54 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1018.eqiad.wmnet']
  • 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T396130)', diff saved to https://phabricator.wikimedia.org/P77932 and previous config saved to /var/cache/conftool/dbconfig/20250613-115049-marostegui.json
  • 11:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P77931 and previous config saved to /var/cache/conftool/dbconfig/20250613-114917-marostegui.json
  • 11:47 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1018.eqiad.wmnet']
  • 11:46 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir7004.magru.wmnet with reason: host reimage
  • 11:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:43 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir7004.magru.wmnet with reason: host reimage
  • 11:41 akosiaris: T390251 re-enable puppet on registry1004 after merging puppet refactoring changes.
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T396130)', diff saved to https://phabricator.wikimedia.org/P77930 and previous config saved to /var/cache/conftool/dbconfig/20250613-113402-marostegui.json
  • 11:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T396130)', diff saved to https://phabricator.wikimedia.org/P77929 and previous config saved to /var/cache/conftool/dbconfig/20250613-113339-marostegui.json
  • 11:22 marostegui@cumin1002: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P77928 and previous config saved to /var/cache/conftool/dbconfig/20250613-111832-marostegui.json
  • 11:14 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7004.magru.wmnet with OS bookworm
  • 11:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P77927 and previous config saved to /var/cache/conftool/dbconfig/20250613-110324-marostegui.json
  • 10:48 root@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-backup1002.eqiad.wmnet: Renew puppet certificate - root@cumin1002
  • 10:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T396130)', diff saved to https://phabricator.wikimedia.org/P77926 and previous config saved to /var/cache/conftool/dbconfig/20250613-104816-marostegui.json
  • 10:45 root@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-backup1001.eqiad.wmnet: Renew puppet certificate - root@cumin1002
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T396130)', diff saved to https://phabricator.wikimedia.org/P77925 and previous config saved to /var/cache/conftool/dbconfig/20250613-103137-marostegui.json
  • 10:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T396130)', diff saved to https://phabricator.wikimedia.org/P77924 and previous config saved to /var/cache/conftool/dbconfig/20250613-103114-marostegui.json
  • 10:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on db2212.codfw.wmnet with reason: Not powering up
  • 10:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P77923 and previous config saved to /var/cache/conftool/dbconfig/20250613-101607-marostegui.json
  • 10:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77922 and previous config saved to /var/cache/conftool/dbconfig/20250613-100754-root.json
  • 10:05 taavi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:05 taavi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add codfw1dev auth v6 VIPs - taavi@cumin1003"
  • 10:05 taavi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add codfw1dev auth v6 VIPs - taavi@cumin1003"
  • 10:02 taavi@cumin1003: START - Cookbook sre.dns.netbox
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P77921 and previous config saved to /var/cache/conftool/dbconfig/20250613-100059-marostegui.json
  • 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77920 and previous config saved to /var/cache/conftool/dbconfig/20250613-095248-root.json
  • 09:47 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-backup1001.eqiad.wmnet with reason: Maintenance and reboot
  • 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T396130)', diff saved to https://phabricator.wikimedia.org/P77919 and previous config saved to /var/cache/conftool/dbconfig/20250613-094552-marostegui.json
  • 09:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77918 and previous config saved to /var/cache/conftool/dbconfig/20250613-093742-root.json
  • 09:35 jmm@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on install7001.wikimedia.org with reason: being replaced by install7002
  • 09:35 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 09:35 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 09:35 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 09:34 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 09:34 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 09:34 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 09:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T396130)', diff saved to https://phabricator.wikimedia.org/P77917 and previous config saved to /var/cache/conftool/dbconfig/20250613-092910-marostegui.json
  • 09:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T396130)', diff saved to https://phabricator.wikimedia.org/P77916 and previous config saved to /var/cache/conftool/dbconfig/20250613-092847-marostegui.json
  • 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77915 and previous config saved to /var/cache/conftool/dbconfig/20250613-092236-root.json
  • 09:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 09:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2148', diff saved to https://phabricator.wikimedia.org/P77914 and previous config saved to /var/cache/conftool/dbconfig/20250613-091800-marostegui.json
  • 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P77913 and previous config saved to /var/cache/conftool/dbconfig/20250613-091339-marostegui.json
  • 09:12 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-backup1002.eqiad.wmnet with reason: Maintenance and reboot
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P77912 and previous config saved to /var/cache/conftool/dbconfig/20250613-085832-marostegui.json
  • 08:56 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:54 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:53 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:49 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:49 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7004.magru.wmnet
  • 08:49 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir7004.magru.wmnet with OS bookworm
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T396130)', diff saved to https://phabricator.wikimedia.org/P77911 and previous config saved to /var/cache/conftool/dbconfig/20250613-084325-marostegui.json
  • 08:35 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:32 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T396130)', diff saved to https://phabricator.wikimedia.org/P77910 and previous config saved to /var/cache/conftool/dbconfig/20250613-082656-marostegui.json
  • 08:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T396130)', diff saved to https://phabricator.wikimedia.org/P77909 and previous config saved to /var/cache/conftool/dbconfig/20250613-082633-marostegui.json
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P77908 and previous config saved to /var/cache/conftool/dbconfig/20250613-081126-marostegui.json
  • 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P77905 and previous config saved to /var/cache/conftool/dbconfig/20250613-075618-marostegui.json
  • 07:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77904 and previous config saved to /var/cache/conftool/dbconfig/20250613-074450-root.json
  • 07:42 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7004.magru.wmnet with OS bookworm
  • 07:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T396130)', diff saved to https://phabricator.wikimedia.org/P77903 and previous config saved to /var/cache/conftool/dbconfig/20250613-074110-marostegui.json
  • 07:35 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 07:35 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 07:35 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7004.magru.wmnet on all recursors
  • 07:35 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7004.magru.wmnet on all recursors
  • 07:34 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:34 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 07:33 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77901 and previous config saved to /var/cache/conftool/dbconfig/20250613-072944-root.json
  • 07:26 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T396130)', diff saved to https://phabricator.wikimedia.org/P77900 and previous config saved to /var/cache/conftool/dbconfig/20250613-072431-marostegui.json
  • 07:24 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T396130)', diff saved to https://phabricator.wikimedia.org/P77899 and previous config saved to /var/cache/conftool/dbconfig/20250613-072408-marostegui.json
  • 07:18 jmm@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 07:16 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 07:16 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
  • 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77898 and previous config saved to /var/cache/conftool/dbconfig/20250613-071438-root.json
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P77897 and previous config saved to /var/cache/conftool/dbconfig/20250613-070901-marostegui.json
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77896 and previous config saved to /var/cache/conftool/dbconfig/20250613-065933-root.json
  • 06:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P77895 and previous config saved to /var/cache/conftool/dbconfig/20250613-065353-marostegui.json
  • 06:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2175', diff saved to https://phabricator.wikimedia.org/P77894 and previous config saved to /var/cache/conftool/dbconfig/20250613-065239-marostegui.json
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T396130)', diff saved to https://phabricator.wikimedia.org/P77893 and previous config saved to /var/cache/conftool/dbconfig/20250613-063845-marostegui.json
  • 06:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77892 and previous config saved to /var/cache/conftool/dbconfig/20250613-063435-root.json
  • 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T396130)', diff saved to https://phabricator.wikimedia.org/P77891 and previous config saved to /var/cache/conftool/dbconfig/20250613-062203-marostegui.json
  • 06:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 06:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T396130)', diff saved to https://phabricator.wikimedia.org/P77890 and previous config saved to /var/cache/conftool/dbconfig/20250613-062140-marostegui.json
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77889 and previous config saved to /var/cache/conftool/dbconfig/20250613-061930-root.json
  • 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P77888 and previous config saved to /var/cache/conftool/dbconfig/20250613-060633-marostegui.json
  • 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77887 and previous config saved to /var/cache/conftool/dbconfig/20250613-060424-root.json
  • 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P77885 and previous config saved to /var/cache/conftool/dbconfig/20250613-055125-marostegui.json
  • 05:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77884 and previous config saved to /var/cache/conftool/dbconfig/20250613-054918-root.json
  • 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2189', diff saved to https://phabricator.wikimedia.org/P77883 and previous config saved to /var/cache/conftool/dbconfig/20250613-054156-marostegui.json
  • 05:40 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T396130)', diff saved to https://phabricator.wikimedia.org/P77882 and previous config saved to /var/cache/conftool/dbconfig/20250613-053617-marostegui.json
  • 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T396130)', diff saved to https://phabricator.wikimedia.org/P77881 and previous config saved to /var/cache/conftool/dbconfig/20250613-052114-marostegui.json
  • 05:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 05:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb1013.eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 03:35 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1017.eqiad.wmnet with OS bullseye
  • 03:18 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: host reimage
  • 03:15 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: host reimage
  • 02:59 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1017.eqiad.wmnet with OS bullseye
  • 02:58 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1017.eqiad.wmnet']
  • 02:57 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1017.eqiad.wmnet']
  • 02:57 andrew@cumin1002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 02:57 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 02:57 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1017.eqiad.wmnet']
  • 02:41 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1017.eqiad.wmnet']
  • 02:40 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1017.eqiad.wmnet']
  • 02:33 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1017.eqiad.wmnet']
  • 02:31 eileen: postinstall
  • 01:55 eileen: * postinstall
  • 01:17 ladsgroup@cumin2002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1251 gradually with 4 steps - Firmware update done
  • 01:14 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-ulsfo and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 00:31 ladsgroup@cumin2002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1253 gradually with 4 steps - Firmware updated
  • 00:29 ladsgroup@cumin2002: START - Cookbook sre.mysql.pool db1251 gradually with 4 steps - Firmware update done
  • 00:13 ladsgroup@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1251.eqiad.wmnet with reason: Firmware update
  • 00:13 ladsgroup@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1251.eqiad.wmnet
  • 00:08 ladsgroup@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1251.eqiad.wmnet

2025-06-12

  • 23:54 ladsgroup@cumin2002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1254 gradually with 4 steps - Firmware update done
  • 23:53 ladsgroup@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1251.eqiad.wmnet
  • 23:53 ladsgroup@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1251.eqiad.wmnet
  • 23:48 ladsgroup@cumin2002: dbctl commit (dc=all): 'Depool db1251 for firmware update (T396648)', diff saved to https://phabricator.wikimedia.org/P77872 and previous config saved to /var/cache/conftool/dbconfig/20250612-234855-ladsgroup.json
  • 23:47 ladsgroup@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1251.eqiad.wmnet with reason: Firmware update
  • 23:43 ladsgroup@cumin2002: START - Cookbook sre.mysql.pool db1253 gradually with 4 steps - Firmware updated
  • 23:43 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 23:37 ladsgroup@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts db1252.eqiad.wmnet
  • 23:36 ladsgroup@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1252.eqiad.wmnet
  • 23:36 ladsgroup@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 23:21 ladsgroup@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1252.eqiad.wmnet
  • 23:20 ladsgroup@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1252.eqiad.wmnet
  • 23:14 bvibber@deploy1003: Finished scap sync-world: Backport for Specify Lua transform arguments on

    Chart definition page not found.

    invocations (T395610)
    , Specify Lua transform arguments on

    Chart definition page not found.

    invocations (T395610)
    (duration: 61m 18s)
  • 23:13 ladsgroup@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1252.eqiad.wmnet with reason: Firmware update
  • 23:06 ladsgroup@cumin2002: START - Cookbook sre.mysql.pool db1254 gradually with 4 steps - Firmware update done
  • 23:01 ladsgroup@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts db1254.eqiad.wmnet
  • 23:00 ladsgroup@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1254.eqiad.wmnet
  • 23:00 bvibber@deploy1003: bvibber: Continuing with sync
  • 22:59 bvibber@deploy1003: bvibber: Backport for Specify Lua transform arguments on

    Chart definition page not found.

    invocations (T395610)
    , Specify Lua transform arguments on

    Chart definition page not found.

    invocations (T395610)
    synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:58 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS bullseye
  • 22:45 ladsgroup@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1254.eqiad.wmnet
  • 22:44 ladsgroup@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1254.eqiad.wmnet
  • 22:43 ladsgroup@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1254.eqiad.wmnet with reason: Firmware upgrade (T396648)
  • 22:43 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1254.eqiad.wmnet
  • 22:42 ladsgroup@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1254.eqiad.wmnet
  • 22:39 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1254.eqiad.wmnet with reason: Firmware upgrade (T396648)
  • 22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1254 (T396648)', diff saved to https://phabricator.wikimedia.org/P77867 and previous config saved to /var/cache/conftool/dbconfig/20250612-223834-ladsgroup.json
  • 22:13 bvibber@deploy1003: Started scap sync-world: Backport for Specify Lua transform arguments on

    Chart definition page not found.

    invocations (T395610)
    , Specify Lua transform arguments on

    Chart definition page not found.

    invocations (T395610)
  • 22:07 maryum: Deploy security fix for T394863
  • 22:00 maryum: Deployed security fix for T396413
  • 21:54 maryum: Deploy security fix for T396524
  • 21:51 cstone: civicrm upgraded from 870eed23 to f2f33db5
  • 21:43 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 21:39 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 21:24 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS bullseye
  • 21:18 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1016.eqiad.wmnet with OS bullseye
  • 21:15 cstone: SmashPig upgraded from 042d5a5b to 84c0668b
  • 21:01 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: host reimage
  • 20:58 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: host reimage
  • 20:48 bvibber@deploy1003: Finished scap sync-world: Backport for Fix for multiple charts on same page using mix of transforms (T396512), Fix for multiple charts on same page using mix of transforms (T396512) (duration: 09m 50s)
  • 20:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1016.eqiad.wmnet with OS bullseye
  • 20:41 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1016.eqiad.wmnet']
  • 20:41 bvibber@deploy1003: bvibber: Continuing with sync
  • 20:40 bvibber@deploy1003: bvibber: Backport for Fix for multiple charts on same page using mix of transforms (T396512), Fix for multiple charts on same page using mix of transforms (T396512) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:38 bvibber@deploy1003: Started scap sync-world: Backport for Fix for multiple charts on same page using mix of transforms (T396512), Fix for multiple charts on same page using mix of transforms (T396512)
  • 20:36 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1016.eqiad.wmnet']
  • 20:35 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1016.eqiad.wmnet']
  • 20:33 dancy@deploy1003: Finished scap sync-world: Backport for enwiki: temporary lift of IP cap for event on 16 June 2025 (T396128) (duration: 09m 54s)
  • 20:26 dancy@deploy1003: dancy, anzx: Continuing with sync
  • 20:25 dancy@deploy1003: dancy, anzx: Backport for enwiki: temporary lift of IP cap for event on 16 June 2025 (T396128) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:23 dancy@deploy1003: Started scap sync-world: Backport for enwiki: temporary lift of IP cap for event on 16 June 2025 (T396128)
  • 20:18 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1016.eqiad.wmnet']
  • 20:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 19:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1015.eqiad.wmnet with reason: host reimage
  • 19:47 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1015.eqiad.wmnet with reason: host reimage
  • 19:31 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 19:27 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
  • 19:19 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
  • 19:18 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
  • 19:18 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 19:18 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
  • 19:18 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
  • 19:02 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 19:02 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
  • 18:58 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:57 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:54 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:53 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:41 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1048.eqiad.wmnet
  • 18:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T395241)', diff saved to https://phabricator.wikimedia.org/P77866 and previous config saved to /var/cache/conftool/dbconfig/20250612-184028-fceratto.json
  • 18:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 18:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T395241)', diff saved to https://phabricator.wikimedia.org/P77865 and previous config saved to /var/cache/conftool/dbconfig/20250612-183749-fceratto.json
  • 18:37 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.5 refs T392175
  • 18:25 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 18:24 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.5 refs T392175
  • 18:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P77864 and previous config saved to /var/cache/conftool/dbconfig/20250612-182241-fceratto.json
  • 18:10 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 18:10 brennen@deploy1003: Finished scap sync-world: Backport for ParserOutput::collectMetadata: Cast array keys to string (T396656) (duration: 10m 51s)
  • 18:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P77863 and previous config saved to /var/cache/conftool/dbconfig/20250612-180733-fceratto.json
  • 18:06 jasmine@dns1004: END - running authdns-update
  • 18:05 jasmine@dns1004: START - running authdns-update
  • 18:04 jasmine@dns1004: START - running authdns-update
  • 18:03 brennen@deploy1003: brennen: Continuing with sync
  • 18:01 brennen@deploy1003: brennen: Backport for ParserOutput::collectMetadata: Cast array keys to string (T396656) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:59 brennen@deploy1003: Started scap sync-world: Backport for ParserOutput::collectMetadata: Cast array keys to string (T396656)
  • 17:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T395241)', diff saved to https://phabricator.wikimedia.org/P77862 and previous config saved to /var/cache/conftool/dbconfig/20250612-175226-fceratto.json
  • 17:50 jasmine@deploy1003: Finished scap sync-world: Deploying apache2 configuration change for T393803 (duration: 20m 58s)
  • 17:44 jasmine@deploy1003: jasmine: Continuing with sync
  • 17:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T395241)', diff saved to https://phabricator.wikimedia.org/P77861 and previous config saved to /var/cache/conftool/dbconfig/20250612-173909-fceratto.json
  • 17:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 17:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T395241)', diff saved to https://phabricator.wikimedia.org/P77860 and previous config saved to /var/cache/conftool/dbconfig/20250612-173843-fceratto.json
  • 17:36 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 17:36 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 17:36 jasmine@deploy1003: jasmine: Deploying apache2 configuration change for T393803 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:35 jasmine@deploy1003: Started scap sync-world: Deploying apache2 configuration change for T393803
  • 17:35 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-ulsfo and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 17:25 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2006
  • 17:25 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2006
  • 17:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sretest2006 to codfw - jhancock@cumin2002"
  • 17:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sretest2006 to codfw - jhancock@cumin2002"
  • 17:24 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:23 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P77859 and previous config saved to /var/cache/conftool/dbconfig/20250612-172335-fceratto.json
  • 17:22 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:22 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:22 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:21 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:21 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 17:18 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudcephosd1015.eqiad.wmnet
  • 17:13 cmooney@cumin1003: START - Cookbook sre.hosts.dhcp for host cloudcephosd1015.eqiad.wmnet
  • 17:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P77857 and previous config saved to /var/cache/conftool/dbconfig/20250612-170828-fceratto.json
  • 16:56 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1015.eqiad.wmnet with OS bookworm
  • 16:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T395241)', diff saved to https://phabricator.wikimedia.org/P77856 and previous config saved to /var/cache/conftool/dbconfig/20250612-165320-fceratto.json
  • 16:49 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:49 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS bullseye
  • 16:48 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bookworm
  • 16:43 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T395241)', diff saved to https://phabricator.wikimedia.org/P77854 and previous config saved to /var/cache/conftool/dbconfig/20250612-163536-fceratto.json
  • 16:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T395241)', diff saved to https://phabricator.wikimedia.org/P77853 and previous config saved to /var/cache/conftool/dbconfig/20250612-163509-fceratto.json
  • 16:35 volans@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2241.mgmt.codfw.wmnet db2242.mgmt.codfw.wmnet on all recursors
  • 16:35 volans@cumin1003: START - Cookbook sre.dns.wipe-cache db2241.mgmt.codfw.wmnet db2242.mgmt.codfw.wmnet on all recursors
  • 16:31 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 16:31 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:30 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:30 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:30 volans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:30 volans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Invert db2241 and db2242 DNS T379757#10908710 - volans@cumin1003"
  • 16:30 volans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Invert db2241 and db2242 DNS T379757#10908710 - volans@cumin1003"
  • 16:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:26 volans@cumin1003: START - Cookbook sre.dns.netbox
  • 16:26 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 16:25 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 16:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P77852 and previous config saved to /var/cache/conftool/dbconfig/20250612-162002-fceratto.json
  • 16:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 16:18 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 16:14 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 16:10 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 16:10 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS bullseye
  • 16:05 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 16:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P77851 and previous config saved to /var/cache/conftool/dbconfig/20250612-160454-fceratto.json
  • 15:58 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
  • 15:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T395241)', diff saved to https://phabricator.wikimedia.org/P77850 and previous config saved to /var/cache/conftool/dbconfig/20250612-154947-fceratto.json
  • 15:49 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
  • 15:49 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host db1253.eqiad.wmnet
  • 15:45 swfrench-wmf: removed python3-conftool-dbctl package from puppetmaster[12]001 - T395696
  • 15:44 logmsgbot: lucaswerkmeister-wmde Deployed security patch for T396685
  • 15:40 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
  • 15:40 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 15:37 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:35 logmsgbot: lucaswerkmeister-wmde Deployed security patch for T396685
  • 15:34 robh@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1253.eqiad.wmnet
  • 15:32 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
  • 15:32 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 15:31 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
  • 15:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T395241)', diff saved to https://phabricator.wikimedia.org/P77848 and previous config saved to /var/cache/conftool/dbconfig/20250612-153008-fceratto.json
  • 15:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T395241)', diff saved to https://phabricator.wikimedia.org/P77847 and previous config saved to /var/cache/conftool/dbconfig/20250612-152942-fceratto.json
  • 15:29 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
  • 15:29 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
  • 15:29 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
  • 15:28 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
  • 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:25 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
  • 15:25 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
  • 15:25 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:25 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2057
  • 15:25 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2057
  • 15:24 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
  • 15:24 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 15:23 urbanecm@deploy1003: Finished scap sync-world: Backport for LinkRecommendationStore: Query templatelinks on the main DB (T396680) (duration: 18m 06s)
  • 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2057
  • 15:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 15:19 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 15:16 urbanecm@deploy1003: urbanecm: Continuing with sync
  • 15:15 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
  • 15:14 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
  • 15:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P77846 and previous config saved to /var/cache/conftool/dbconfig/20250612-151434-fceratto.json
  • 15:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2057
  • 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2056
  • 15:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2056
  • 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2055
  • 15:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2055
  • 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2054
  • 15:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2054
  • 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2053
  • 15:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2053
  • 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2052
  • 15:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2052
  • 15:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2051
  • 15:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2051
  • 15:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2050
  • 15:12 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
  • 15:12 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1048.eqiad.wmnet
  • 15:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2050
  • 15:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2049
  • 15:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2049
  • 15:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2048
  • 15:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2048
  • 15:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2047
  • 15:11 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2047
  • 15:11 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2046
  • 15:11 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2046
  • 15:11 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2045
  • 15:11 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2045
  • 15:11 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:11 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2045-57 to codfw - jhancock@cumin2002"
  • 15:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2045-57 to codfw - jhancock@cumin2002"
  • 15:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 15:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 15:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 15:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 15:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 15:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 15:09 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs7002.magru.wmnet} and A:liberica
  • 15:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 15:08 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 15:08 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 15:08 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs7002.magru.wmnet} and A:liberica
  • 15:08 vgutierrez: re-pooling lvs7002 using katran - T396561
  • 15:08 urbanecm@deploy1003: urbanecm: Backport for LinkRecommendationStore: Query templatelinks on the main DB (T396680) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:07 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:05 urbanecm@deploy1003: Started scap sync-world: Backport for LinkRecommendationStore: Query templatelinks on the main DB (T396680)
  • 15:03 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 15:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 15:03 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 15:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 15:03 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:03 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 15:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 15:03 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 15:03 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 15:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 15:01 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
  • 14:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P77845 and previous config saved to /var/cache/conftool/dbconfig/20250612-145927-fceratto.json
  • 14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update for codfw - jhancock@cumin2002"
  • 14:58 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 14:58 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 14:58 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 14:58 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 14:57 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:57 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:57 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 14:57 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 14:57 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:57 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 14:57 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 14:57 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 14:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update for codfw - jhancock@cumin2002"
  • 14:54 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
  • 14:52 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:49 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1014.eqiad.wmnet with OS bullseye
  • 14:48 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7002.magru.wmnet
  • 14:48 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7002.magru.wmnet
  • 14:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T395241)', diff saved to https://phabricator.wikimedia.org/P77843 and previous config saved to /var/cache/conftool/dbconfig/20250612-144419-fceratto.json
  • 14:38 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:37 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:37 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:37 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:37 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:36 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:36 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:35 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:30 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@cb6b18b]: hotfix-bump SEAL to v0.8.0 (duration: 02m 24s)
  • 14:28 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@cb6b18b]: hotfix-bump SEAL to v0.8.0
  • 14:28 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2033.codfw.wmnet
  • 14:28 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2033.codfw.wmnet
  • 14:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T395241)', diff saved to https://phabricator.wikimedia.org/P77842 and previous config saved to /var/cache/conftool/dbconfig/20250612-142738-fceratto.json
  • 14:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 14:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T395241)', diff saved to https://phabricator.wikimedia.org/P77841 and previous config saved to /var/cache/conftool/dbconfig/20250612-142712-fceratto.json
  • 14:24 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1014.eqiad.wmnet with reason: host reimage
  • 14:21 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1014.eqiad.wmnet with reason: host reimage
  • 14:20 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2033.codfw.wmnet
  • 14:12 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2033.codfw.wmnet
  • 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P77840 and previous config saved to /var/cache/conftool/dbconfig/20250612-141205-fceratto.json
  • 14:04 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1014.eqiad.wmnet with OS bullseye
  • 14:02 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 13:57 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 13:57 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 13:57 vgutierrez: upload liberica 0.18 to apt.wm.o (bookworm-wikimedia) - T396751
  • 13:57 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:57 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P77839 and previous config saved to /var/cache/conftool/dbconfig/20250612-135657-fceratto.json
  • 13:56 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:55 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:46 moritzm: installing mariadb security updates (as shipped in Debian, not the wmf-mariadb packages we use for the main mariadb clusters)
  • 13:46 gkyziridis@deploy1003: Finished scap sync-world: Backport for ores-extension: enable ores extension UI for second batch of wikis (T395823) (duration: 11m 00s)
  • 13:45 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:45 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:45 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
  • 13:44 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:44 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2034.codfw.wmnet
  • 13:44 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:44 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:44 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:43 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T395241)', diff saved to https://phabricator.wikimedia.org/P77838 and previous config saved to /var/cache/conftool/dbconfig/20250612-134149-fceratto.json
  • 13:41 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:41 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:41 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1014.eqiad.wmnet with OS bullseye
  • 13:39 gkyziridis@deploy1003: gkyziridis, isaranto: Continuing with sync
  • 13:37 moritzm: failover Ganeti master in eqiad to ganeti1046
  • 13:37 gkyziridis@deploy1003: gkyziridis, isaranto: Backport for ores-extension: enable ores extension UI for second batch of wikis (T395823) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:36 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2034.codfw.wmnet
  • 13:36 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1014.eqiad.wmnet with OS bullseye
  • 13:36 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
  • 13:35 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2034.codfw.wmnet
  • 13:35 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
  • 13:35 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:35 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable ores extension UI for second batch of wikis (T395823)
  • 13:30 gkyziridis@deploy1003: Finished scap sync-world: Backport for Revert "ores-extension: enable oresUI for the second batch of wikis" (duration: 10m 01s)
  • 13:28 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:27 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:26 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp7002*} and A:cp - 9.2.10 upgrade (T390912)
  • 13:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T395241)', diff saved to https://phabricator.wikimedia.org/P77837 and previous config saved to /var/cache/conftool/dbconfig/20250612-132356-fceratto.json
  • 13:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T395241)', diff saved to https://phabricator.wikimedia.org/P77836 and previous config saved to /var/cache/conftool/dbconfig/20250612-132329-fceratto.json
  • 13:23 gkyziridis@deploy1003: gkyziridis: Continuing with sync
  • 13:22 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:22 gkyziridis@deploy1003: gkyziridis: Backport for Revert "ores-extension: enable oresUI for the second batch of wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:22 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:21 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp7002*} and A:cp - 9.2.10 upgrade (T390912)
  • 13:21 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp3081*} and A:cp - 9.2.10 upgrade (T390912)
  • 13:20 gehel: depooling wdqs1022, it seems to not be updated - T396577
  • 13:20 gehel: depooling wdqs1022, it seems to not be updated
  • 13:20 gkyziridis@deploy1003: Started scap sync-world: Backport for Revert "ores-extension: enable oresUI for the second batch of wikis"
  • 13:18 gkyziridis@deploy1003: Sync cancelled.
  • 13:16 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp3081*} and A:cp - 9.2.10 upgrade (T390912)
  • 13:10 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P77835 and previous config saved to /var/cache/conftool/dbconfig/20250612-130822-fceratto.json
  • 13:06 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:06 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:06 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2034.codfw.wmnet
  • 13:06 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
  • 13:06 gkyziridis@deploy1003: gkyziridis: Backport for ores-extension: enable oresUI for the second batch of wikis (T395823 T395668) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:05 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:05 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 13:05 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1014.eqiad.wmnet with OS bullseye
  • 13:04 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2030.codfw.wmnet
  • 13:04 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable oresUI for the second batch of wikis (T395823 T395668)
  • 13:03 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to 17.10
  • 13:01 vgutierrez@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs7002.magru.wmnet with reason: switching to katran
  • 13:01 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host ncredir7004.magru.wmnet
  • 13:01 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ncredir7004.magru.wmnet with OS bookworm
  • 13:00 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1014.eqiad.wmnet with OS bullseye
  • 12:59 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 12:59 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7004.magru.wmnet with OS bookworm
  • 12:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P77834 and previous config saved to /var/cache/conftool/dbconfig/20250612-125314-fceratto.json
  • 12:53 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 12:52 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 12:52 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7004.magru.wmnet on all recursors
  • 12:52 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7004.magru.wmnet on all recursors
  • 12:52 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:51 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs7002.magru.wmnet} and A:liberica (T396561)
  • 12:51 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
  • 12:51 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P{lvs7002.magru.wmnet} and A:liberica (T396561)
  • 12:51 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 12:51 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 12:50 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 12:50 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 12:50 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
  • 12:49 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 12:49 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
  • 12:49 vgutierrez: depooling lvs7002 before migrating to katran - T396561
  • 12:48 andrew@cumin1002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1015.eqiad.wmnet
  • 12:48 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1015.eqiad.wmnet
  • 12:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T395241)', diff saved to https://phabricator.wikimedia.org/P77833 and previous config saved to /var/cache/conftool/dbconfig/20250612-123806-fceratto.json
  • 12:37 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1014.eqiad.wmnet with OS bullseye
  • 12:27 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
  • 12:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
  • 12:20 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
  • 12:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T395241)', diff saved to https://phabricator.wikimedia.org/P77831 and previous config saved to /var/cache/conftool/dbconfig/20250612-121141-fceratto.json
  • 12:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 12:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T395241)', diff saved to https://phabricator.wikimedia.org/P77830 and previous config saved to /var/cache/conftool/dbconfig/20250612-121125-fceratto.json
  • 12:10 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
  • 12:03 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2046.codfw.wmnet to cluster codfw and group A
  • 12:01 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti2046.codfw.wmnet to cluster codfw and group A
  • 12:00 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2045.codfw.wmnet to cluster codfw and group A
  • 11:58 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti2045.codfw.wmnet to cluster codfw and group A
  • 11:56 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2046.codfw.wmnet
  • 11:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P77829 and previous config saved to /var/cache/conftool/dbconfig/20250612-115618-fceratto.json
  • 11:55 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7004.magru.wmnet
  • 11:55 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir7004.magru.wmnet with OS bookworm
  • 11:49 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2046.codfw.wmnet
  • 11:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P77828 and previous config saved to /var/cache/conftool/dbconfig/20250612-114110-fceratto.json
  • 11:40 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1052.eqiad.wmnet
  • 11:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1052.eqiad.wmnet
  • 11:35 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1014.eqiad.wmnet with OS bullseye
  • 11:35 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1014.eqiad.wmnet with OS bullseye
  • 11:34 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1052.eqiad.wmnet
  • 11:31 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2045.codfw.wmnet
  • 11:30 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1052.eqiad.wmnet
  • 11:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T395241)', diff saved to https://phabricator.wikimedia.org/P77826 and previous config saved to /var/cache/conftool/dbconfig/20250612-112602-fceratto.json
  • 11:24 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2045.codfw.wmnet
  • 11:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T396130)', diff saved to https://phabricator.wikimedia.org/P77825 and previous config saved to /var/cache/conftool/dbconfig/20250612-111722-marostegui.json
  • 11:14 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T395241)', diff saved to https://phabricator.wikimedia.org/P77824 and previous config saved to /var/cache/conftool/dbconfig/20250612-111423-fceratto.json
  • 11:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 11:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T395241)', diff saved to https://phabricator.wikimedia.org/P77823 and previous config saved to /var/cache/conftool/dbconfig/20250612-111357-fceratto.json
  • 11:10 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1051.eqiad.wmnet
  • 11:10 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1051.eqiad.wmnet
  • 11:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 11:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 11:07 vgutierrez: use Google Trust Services (GTS) unified TLS certificate on drmrs - T395131
  • 11:07 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1014.eqiad.wmnet with OS bullseye
  • 11:05 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1051.eqiad.wmnet
  • 11:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 11:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P77822 and previous config saved to /var/cache/conftool/dbconfig/20250612-110213-marostegui.json
  • 11:01 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7004.magru.wmnet with OS bookworm
  • 11:00 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 11:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 11:00 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 11:00 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7004.magru.wmnet on all recursors
  • 10:59 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7004.magru.wmnet on all recursors
  • 10:59 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:59 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 10:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P77821 and previous config saved to /var/cache/conftool/dbconfig/20250612-105848-fceratto.json
  • 10:57 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1051.eqiad.wmnet
  • 10:56 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1050.eqiad.wmnet
  • 10:56 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1050.eqiad.wmnet
  • 10:50 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1050.eqiad.wmnet
  • 10:50 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 10:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:47 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1050.eqiad.wmnet
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P77820 and previous config saved to /var/cache/conftool/dbconfig/20250612-104706-marostegui.json
  • 10:44 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1049.eqiad.wmnet
  • 10:43 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1049.eqiad.wmnet
  • 10:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P77819 and previous config saved to /var/cache/conftool/dbconfig/20250612-104341-fceratto.json
  • 10:43 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 10:43 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
  • 10:42 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2050.codfw.wmnet with OS bookworm
  • 10:38 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1049.eqiad.wmnet
  • 10:36 cgoubert@deploy1003: Finished scap sync-world: 1156288: mediawiki: Add job history limit control - T395885 (duration: 02m 48s)
  • 10:33 cgoubert@deploy1003: Started scap sync-world: 1156288: mediawiki: Add job history limit control - T395885
  • 10:32 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1049.eqiad.wmnet
  • 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T396130)', diff saved to https://phabricator.wikimedia.org/P77818 and previous config saved to /var/cache/conftool/dbconfig/20250612-103159-marostegui.json
  • 10:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T395241)', diff saved to https://phabricator.wikimedia.org/P77817 and previous config saved to /var/cache/conftool/dbconfig/20250612-102834-fceratto.json
  • 10:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T396130)', diff saved to https://phabricator.wikimedia.org/P77816 and previous config saved to /var/cache/conftool/dbconfig/20250612-102700-marostegui.json
  • 10:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T396130)', diff saved to https://phabricator.wikimedia.org/P77815 and previous config saved to /var/cache/conftool/dbconfig/20250612-102630-marostegui.json
  • 10:25 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti2050.codfw.wmnet with OS bookworm
  • 10:24 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7004.magru.wmnet
  • 10:23 jmm@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:23 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2050.codfw.wmnet with OS bookworm
  • 10:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T395241)', diff saved to https://phabricator.wikimedia.org/P77814 and previous config saved to /var/cache/conftool/dbconfig/20250612-101655-fceratto.json
  • 10:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 10:14 moritzm: installing Kerberos security updates
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P77813 and previous config saved to /var/cache/conftool/dbconfig/20250612-101123-marostegui.json
  • 10:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:07 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 10:07 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
  • 10:06 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti2050.codfw.wmnet with OS bookworm
  • 09:53 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2049.codfw.wmnet with OS bookworm
  • 09:50 esanders@deploy1003: Finished scap sync-world: Backport for Support placeholders mangled by MF's HtmlFormatter (T396695) (duration: 10m 37s)
  • 09:46 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet
  • 09:43 esanders@deploy1003: esanders: Continuing with sync
  • 09:42 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7004.magru.wmnet
  • 09:42 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7004.magru.wmnet on all recursors
  • 09:42 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7004.magru.wmnet on all recursors
  • 09:42 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:42 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 09:41 esanders@deploy1003: esanders: Backport for Support placeholders mangled by MF's HtmlFormatter (T396695) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:41 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti1047.eqiad.wmnet with reason: hw check
  • 09:41 cmooney@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet
  • 09:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T396130)', diff saved to https://phabricator.wikimedia.org/P77811 and previous config saved to /var/cache/conftool/dbconfig/20250612-094109-marostegui.json
  • 09:39 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
  • 09:39 esanders@deploy1003: Started scap sync-world: Backport for Support placeholders mangled by MF's HtmlFormatter (T396695)
  • 09:39 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 09:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2049.codfw.wmnet with reason: host reimage
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T396130)', diff saved to https://phabricator.wikimedia.org/P77809 and previous config saved to /var/cache/conftool/dbconfig/20250612-093631-marostegui.json
  • 09:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T396130)', diff saved to https://phabricator.wikimedia.org/P77808 and previous config saved to /var/cache/conftool/dbconfig/20250612-093609-marostegui.json
  • 09:34 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2049.codfw.wmnet with reason: host reimage
  • 09:32 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
  • 09:29 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 09:28 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7004.magru.wmnet on all recursors
  • 09:28 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7004.magru.wmnet on all recursors
  • 09:28 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:26 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 09:26 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
  • 09:24 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1002.eqiad.wmnet
  • 09:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P77806 and previous config saved to /var/cache/conftool/dbconfig/20250612-092103-marostegui.json
  • 09:20 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:19 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7004.magru.wmnet
  • 09:19 jmm@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:15 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan1002.eqiad.wmnet
  • 09:11 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
  • 09:08 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to 17.10
  • 09:07 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
  • 09:07 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.10
  • 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P77805 and previous config saved to /var/cache/conftool/dbconfig/20250612-090555-marostegui.json
  • 09:05 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti2049.codfw.wmnet with OS bookworm
  • 09:04 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
  • 09:04 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 09:04 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
  • 09:04 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on install7001.wikimedia.org with reason: migration to install7002
  • 08:57 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
  • 08:57 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.10
  • 08:56 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host ncredir7004.magru.wmnet
  • 08:56 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ncredir7004.magru.wmnet with OS bookworm
  • 08:56 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.10
  • 08:54 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2048.codfw.wmnet with OS bookworm
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77804 and previous config saved to /var/cache/conftool/dbconfig/20250612-085359-root.json
  • 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T396130)', diff saved to https://phabricator.wikimedia.org/P77803 and previous config saved to /var/cache/conftool/dbconfig/20250612-085048-marostegui.json
  • 08:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:46 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.10
  • 08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T396130)', diff saved to https://phabricator.wikimedia.org/P77802 and previous config saved to /var/cache/conftool/dbconfig/20250612-084611-marostegui.json
  • 08:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T396130)', diff saved to https://phabricator.wikimedia.org/P77801 and previous config saved to /var/cache/conftool/dbconfig/20250612-084600-marostegui.json
  • 08:38 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2048.codfw.wmnet with reason: host reimage
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77800 and previous config saved to /var/cache/conftool/dbconfig/20250612-083854-root.json
  • 08:35 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2048.codfw.wmnet with reason: host reimage
  • 08:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P77799 and previous config saved to /var/cache/conftool/dbconfig/20250612-083053-marostegui.json
  • 08:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:26 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77798 and previous config saved to /var/cache/conftool/dbconfig/20250612-082348-root.json
  • 08:23 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7004.magru.wmnet with OS bookworm
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2225 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77797 and previous config saved to /var/cache/conftool/dbconfig/20250612-082223-root.json
  • 08:22 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
  • 08:19 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 08:19 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 08:19 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7004.magru.wmnet on all recursors
  • 08:19 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7004.magru.wmnet on all recursors
  • 08:19 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:19 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 08:19 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
  • 08:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P77796 and previous config saved to /var/cache/conftool/dbconfig/20250612-081546-marostegui.json
  • 08:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:12 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:12 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:12 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:11 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:11 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77795 and previous config saved to /var/cache/conftool/dbconfig/20250612-080843-root.json
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2225 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77794 and previous config saved to /var/cache/conftool/dbconfig/20250612-080717-root.json
  • 08:01 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 08:01 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
  • 08:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T396130)', diff saved to https://phabricator.wikimedia.org/P77793 and previous config saved to /var/cache/conftool/dbconfig/20250612-080039-marostegui.json
  • 07:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ncredir7001.magru.wmnet
  • 07:59 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:59 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 07:58 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 07:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2047.codfw.wmnet with OS bookworm
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T396130)', diff saved to https://phabricator.wikimedia.org/P77791 and previous config saved to /var/cache/conftool/dbconfig/20250612-075501-marostegui.json
  • 07:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T396130)', diff saved to https://phabricator.wikimedia.org/P77790 and previous config saved to /var/cache/conftool/dbconfig/20250612-075437-marostegui.json
  • 07:53 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77789 and previous config saved to /var/cache/conftool/dbconfig/20250612-075338-root.json
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2225 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77788 and previous config saved to /var/cache/conftool/dbconfig/20250612-075211-root.json
  • 07:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:49 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 07:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1046.eqiad.wmnet with reason: Maintenance
  • 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1046', diff saved to https://phabricator.wikimedia.org/P77787 and previous config saved to /var/cache/conftool/dbconfig/20250612-074624-marostegui.json
  • 07:45 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts ncredir7001.magru.wmnet
  • 07:44 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
  • 07:44 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5002.eqsin.wmnet
  • 07:43 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2047.codfw.wmnet with reason: host reimage
  • 07:40 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2047.codfw.wmnet with reason: host reimage
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P77786 and previous config saved to /var/cache/conftool/dbconfig/20250612-073930-marostegui.json
  • 07:38 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
  • 07:38 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
  • 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2225 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77785 and previous config saved to /var/cache/conftool/dbconfig/20250612-073705-root.json
  • 07:34 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
  • 07:30 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
  • 07:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2225.codfw.wmnet with reason: Maintenance
  • 07:28 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
  • 07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2225 T396549', diff saved to https://phabricator.wikimedia.org/P77784 and previous config saved to /var/cache/conftool/dbconfig/20250612-072827-marostegui.json
  • 07:28 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus3003.esams.wmnet
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P77783 and previous config saved to /var/cache/conftool/dbconfig/20250612-072422-marostegui.json
  • 07:23 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet
  • 07:23 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1008.eqiad.wmnet
  • 07:15 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet
  • 07:15 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1008.eqiad.wmnet
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T396130)', diff saved to https://phabricator.wikimedia.org/P77782 and previous config saved to /var/cache/conftool/dbconfig/20250612-070914-marostegui.json
  • 07:08 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 07:07 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 07:04 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
  • 07:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T396130)', diff saved to https://phabricator.wikimedia.org/P77781 and previous config saved to /var/cache/conftool/dbconfig/20250612-070427-marostegui.json
  • 07:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T396130)', diff saved to https://phabricator.wikimedia.org/P77780 and previous config saved to /var/cache/conftool/dbconfig/20250612-070405-marostegui.json
  • 06:58 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 06:57 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77779 and previous config saved to /var/cache/conftool/dbconfig/20250612-065028-root.json
  • 06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P77778 and previous config saved to /var/cache/conftool/dbconfig/20250612-064858-marostegui.json
  • 06:42 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet,dbprov1004.eqiad.wmnet with reason: Downtime hosts for MariaDB 10.11 upgrade
  • 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77777 and previous config saved to /var/cache/conftool/dbconfig/20250612-063755-root.json
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77776 and previous config saved to /var/cache/conftool/dbconfig/20250612-063522-root.json
  • 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P77775 and previous config saved to /var/cache/conftool/dbconfig/20250612-063350-marostegui.json
  • 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2226 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77774 and previous config saved to /var/cache/conftool/dbconfig/20250612-062546-root.json
  • 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77773 and previous config saved to /var/cache/conftool/dbconfig/20250612-062248-root.json
  • 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77772 and previous config saved to /var/cache/conftool/dbconfig/20250612-062016-root.json
  • 06:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T396130)', diff saved to https://phabricator.wikimedia.org/P77771 and previous config saved to /var/cache/conftool/dbconfig/20250612-061843-marostegui.json
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1207 from dbctl T396697', diff saved to https://phabricator.wikimedia.org/P77770 and previous config saved to /var/cache/conftool/dbconfig/20250612-061700-marostegui.json
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T396130)', diff saved to https://phabricator.wikimedia.org/P77769 and previous config saved to /var/cache/conftool/dbconfig/20250612-061405-marostegui.json
  • 06:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 06:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2226 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77767 and previous config saved to /var/cache/conftool/dbconfig/20250612-061041-root.json
  • 06:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Maintenance
  • 06:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 06:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77766 and previous config saved to /var/cache/conftool/dbconfig/20250612-060743-root.json
  • 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77765 and previous config saved to /var/cache/conftool/dbconfig/20250612-060510-root.json
  • 05:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 05:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2226 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77764 and previous config saved to /var/cache/conftool/dbconfig/20250612-055535-root.json
  • 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1207 T396697', diff saved to https://phabricator.wikimedia.org/P77763 and previous config saved to /var/cache/conftool/dbconfig/20250612-055439-marostegui.json
  • 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Pool db1206', diff saved to https://phabricator.wikimedia.org/P77762 and previous config saved to /var/cache/conftool/dbconfig/20250612-055339-marostegui.json
  • 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1206 T396697', diff saved to https://phabricator.wikimedia.org/P77761 and previous config saved to /var/cache/conftool/dbconfig/20250612-055318-marostegui.json
  • 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77760 and previous config saved to /var/cache/conftool/dbconfig/20250612-055237-root.json
  • 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1184 T396697', diff saved to https://phabricator.wikimedia.org/P77759 and previous config saved to /var/cache/conftool/dbconfig/20250612-055136-marostegui.json
  • 05:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77758 and previous config saved to /var/cache/conftool/dbconfig/20250612-055005-root.json
  • 05:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2026.codfw.wmnet with reason: Maintenance
  • 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2026', diff saved to https://phabricator.wikimedia.org/P77757 and previous config saved to /var/cache/conftool/dbconfig/20250612-054315-marostegui.json
  • 05:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2226 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77756 and previous config saved to /var/cache/conftool/dbconfig/20250612-054030-root.json
  • 05:35 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2226.codfw.wmnet with reason: Maintenance
  • 05:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2226', diff saved to https://phabricator.wikimedia.org/P77755 and previous config saved to /var/cache/conftool/dbconfig/20250612-053450-marostegui.json
  • 05:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 04:27 TimStarling: ran cleanupBlocks.php on all wikis for T373847 and T389301
  • 03:52 eileen: config revision changed from 724b1679 to df8bc7dd

2025-06-11

  • 22:53 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1253.eqiad.wmnet with reason: Firmware upgrade (T396648)
  • 22:49 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
  • 22:48 ladsgroup@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
  • 22:48 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
  • 22:47 ladsgroup@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
  • 22:43 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: Firmware upgrade (T396648)
  • 22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1253 (T396648)', diff saved to https://phabricator.wikimedia.org/P77754 and previous config saved to /var/cache/conftool/dbconfig/20250611-224035-ladsgroup.json
  • 21:48 cdobbins@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 21:43 cdobbins@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 21:28 apine@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:27 apine@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:27 apine@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:26 apine@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:24 cdobbins@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 21:21 jforrester@deploy1003: Finished scap sync-world: Backport for WikiLambda: Set repo-only config only in repo mode, WikiLambda: Enable orchestrator cache updates on edit (T390746) (duration: 09m 45s)
  • 21:18 cdobbins@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
  • 21:14 jforrester@deploy1003: jforrester: Continuing with sync
  • 21:14 jforrester@deploy1003: jforrester: Backport for WikiLambda: Set repo-only config only in repo mode, WikiLambda: Enable orchestrator cache updates on edit (T390746) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:12 jforrester@deploy1003: Started scap sync-world: Backport for WikiLambda: Set repo-only config only in repo mode, WikiLambda: Enable orchestrator cache updates on edit (T390746)
  • 20:31 dwisehaupt@dns1004: END - running authdns-update
  • 20:30 dwisehaupt@dns1004: START - running authdns-update
  • 20:27 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 20:24 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
  • 20:16 cjming@deploy1003: Finished scap sync-world: Backport for Change OutputPage::wrapWikiTextAsInterface() to soft-deprecation (T396618) (duration: 10m 00s)
  • 20:09 cjming@deploy1003: matmarex, cjming: Continuing with sync
  • 20:08 cjming@deploy1003: matmarex, cjming: Backport for Change OutputPage::wrapWikiTextAsInterface() to soft-deprecation (T396618) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:07 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 20:07 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
  • 20:07 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:06 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:06 cjming@deploy1003: Started scap sync-world: Backport for Change OutputPage::wrapWikiTextAsInterface() to soft-deprecation (T396618)
  • 20:06 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
  • 20:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 19:46 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 19:45 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
  • 19:44 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:38 aokoth@dns1004: END - running authdns-update
  • 19:38 aokoth@dns1004: START - running authdns-update
  • 19:34 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:32 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:31 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:31 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:30 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
  • 19:17 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:16 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 19:13 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
  • 19:10 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
  • 19:02 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 18:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T395241)', diff saved to https://phabricator.wikimedia.org/P77751 and previous config saved to /var/cache/conftool/dbconfig/20250611-185748-fceratto.json
  • 18:56 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
  • 18:52 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:51 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 18:51 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 18:51 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 18:50 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 18:50 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 18:50 moritzm: remove ganeti1047 from Ganeti cluster in eqiad for hardware diagnosis
  • 18:50 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 18:50 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 18:49 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 18:49 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:49 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:44 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:43 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1047.eqiad.wmnet
  • 18:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P77750 and previous config saved to /var/cache/conftool/dbconfig/20250611-184242-fceratto.json
  • 18:42 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 18:37 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:37 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
  • 18:37 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
  • 18:37 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 18:36 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 18:36 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 18:36 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 18:36 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 18:35 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 18:35 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 18:34 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 18:31 urandom: truncating restbase mobile-sections table — T395845
  • 18:30 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 18:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P77749 and previous config saved to /var/cache/conftool/dbconfig/20250611-182735-fceratto.json
  • 18:26 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:26 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
  • 18:26 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
  • 18:23 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 18:21 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.5 refs T392175
  • 18:16 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
  • 18:16 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
  • 18:13 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
  • 18:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T395241)', diff saved to https://phabricator.wikimedia.org/P77748 and previous config saved to /var/cache/conftool/dbconfig/20250611-181228-fceratto.json
  • 18:12 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
  • 18:12 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 18:10 sukhe: sudo cumin 'A:lvs-low-traffic-eqiad or A:lvs-low-traffic-codfw' 'run-puppet-agent': T143553
  • 18:09 brennen: 1.45.0-wmf.5 train status (392175): no current blockers, logs reasonably clean, rolling to group1
  • 18:08 sukhe: sudo cumin 'A:lvs-secondary-eqiad or A:lvs-secondary-codfw' 'run-puppet-agent': T143553
  • 18:06 sukhe@dns1004: END - running authdns-update
  • 18:06 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 18:05 sukhe@dns1004: START - running authdns-update
  • 18:04 sukhe@dns1004: END - running authdns-update
  • 18:03 sukhe@dns1004: START - running authdns-update
  • 18:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1251 (T395241)', diff saved to https://phabricator.wikimedia.org/P77747 and previous config saved to /var/cache/conftool/dbconfig/20250611-180309-fceratto.json
  • 18:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1251.eqiad.wmnet with reason: Maintenance
  • 18:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T395241)', diff saved to https://phabricator.wikimedia.org/P77746 and previous config saved to /var/cache/conftool/dbconfig/20250611-180254-fceratto.json
  • 17:57 ryankemper: T143553 Pooled `dns-disc=search-(omega|psi)` per plan in https://phabricator.wikimedia.org/T143553#10861215
  • 17:56 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search-omega
  • 17:56 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search-psi
  • 17:55 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
  • 17:51 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
  • 17:50 sukhe: running agent on A:dnsbox T143553
  • 17:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
  • 17:48 ryankemper: T143553 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151300 to add dnsdisc entries for omega/psi clusters (second patch in plan https://phabricator.wikimedia.org/T143553#10861215)
  • 17:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P77745 and previous config saved to /var/cache/conftool/dbconfig/20250611-174747-fceratto.json
  • 17:45 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
  • 17:37 ryankemper: T143553 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151308 (first patch in plan https://phabricator.wikimedia.org/T143553#10861215)
  • 17:35 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
  • 17:35 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
  • 17:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P77744 and previous config saved to /var/cache/conftool/dbconfig/20250611-173240-fceratto.json
  • 17:29 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
  • 17:19 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
  • 17:18 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
  • 17:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T395241)', diff saved to https://phabricator.wikimedia.org/P77743 and previous config saved to /var/cache/conftool/dbconfig/20250611-171733-fceratto.json
  • 17:16 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
  • 17:13 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
  • 17:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:09 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T395241)', diff saved to https://phabricator.wikimedia.org/P77742 and previous config saved to /var/cache/conftool/dbconfig/20250611-170922-fceratto.json
  • 17:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 17:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T395241)', diff saved to https://phabricator.wikimedia.org/P77741 and previous config saved to /var/cache/conftool/dbconfig/20250611-170857-fceratto.json
  • 17:01 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
  • 17:00 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
  • 16:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P77740 and previous config saved to /var/cache/conftool/dbconfig/20250611-165350-fceratto.json
  • 16:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P77739 and previous config saved to /var/cache/conftool/dbconfig/20250611-163842-fceratto.json
  • 16:34 btullis@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:33 btullis@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1013
  • 16:32 btullis@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1013
  • 16:31 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:31 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: After moving dse-k8s-worker1013 vlan - btullis@cumin1002"
  • 16:31 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: After moving dse-k8s-worker1013 vlan - btullis@cumin1002"
  • 16:27 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 16:25 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dse-k8s-worker1013.eqiad.wmnet
  • 16:25 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:25 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 16:25 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 16:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T395241)', diff saved to https://phabricator.wikimedia.org/P77738 and previous config saved to /var/cache/conftool/dbconfig/20250611-162335-fceratto.json
  • 16:18 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 16:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T395241)', diff saved to https://phabricator.wikimedia.org/P77737 and previous config saved to /var/cache/conftool/dbconfig/20250611-161509-fceratto.json
  • 16:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 16:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T395241)', diff saved to https://phabricator.wikimedia.org/P77736 and previous config saved to /var/cache/conftool/dbconfig/20250611-161444-fceratto.json
  • 16:10 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@b0517a4]: Deploy to pickup T385112#10905490. (duration: 02m 14s)
  • 16:10 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@b0517a4]: Deploy to pickup T385112#10905490.
  • 16:09 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts dse-k8s-worker1013.eqiad.wmnet
  • 16:02 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
  • 16:01 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:00 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
  • 15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P77735 and previous config saved to /var/cache/conftool/dbconfig/20250611-155937-fceratto.json
  • 15:59 dancy@deploy1003: Installation of scap version "4.173.0" completed for 2 hosts
  • 15:57 dancy@deploy1003: Installing scap version "4.173.0" for 2 host(s)
  • 15:56 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 15:56 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 15:56 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 15:55 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 15:55 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:55 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 15:55 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 15:55 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 15:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:47 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
  • 15:46 btullis@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1012
  • 15:44 btullis@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1012
  • 15:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P77734 and previous config saved to /var/cache/conftool/dbconfig/20250611-154430-fceratto.json
  • 15:43 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:42 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:42 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: After moving dse-k8s-worker1012 vlan - btullis@cumin1002"
  • 15:42 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: After moving dse-k8s-worker1012 vlan - btullis@cumin1002"
  • 15:39 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 15:38 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:38 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 264525
  • 15:38 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 264525
  • 15:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:30 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T395241)', diff saved to https://phabricator.wikimedia.org/P77733 and previous config saved to /var/cache/conftool/dbconfig/20250611-152923-fceratto.json
  • 15:29 vgutierrez: use Google Trust Services (GTS) unified TLS certificate on eqsin - T395131
  • 15:26 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:26 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts relforge[1003-1004].eqiad.wmnet
  • 15:24 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:24 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: relforge[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 15:24 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: relforge[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 15:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dse-k8s-worker1012.eqiad.wmnet
  • 15:24 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:24 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1012.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 15:23 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1012.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T395241)', diff saved to https://phabricator.wikimedia.org/P77732 and previous config saved to /var/cache/conftool/dbconfig/20250611-152220-fceratto.json
  • 15:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 15:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77731 and previous config saved to /var/cache/conftool/dbconfig/20250611-152155-fceratto.json
  • 15:21 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet [reason: repooling after testing 9.2.10 upgrade: T390912]
  • 15:21 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:20 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 15:19 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp4037*} and A:cp - 9.2.10 upgrade (T390912)
  • 15:15 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1001.eqiad.wmnet
  • 15:15 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp4037*} and A:cp - 9.2.10 upgrade (T390912)
  • 15:15 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:14 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts dse-k8s-worker1012.eqiad.wmnet
  • 15:14 sukhe: depool cp4037 to test ATS 9.2.10 upgrade: T390912
  • 15:13 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet [reason: testing 9.2.10 upgrade]
  • 15:10 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.10-1wm2_amd64.changes: T390912
  • 15:09 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:09 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:09 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:08 elukey@cumin1003: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:08 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:06 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts relforge[1003-1004].eqiad.wmnet
  • 15:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P77729 and previous config saved to /var/cache/conftool/dbconfig/20250611-150647-fceratto.json
  • 15:06 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan1001.eqiad.wmnet
  • 15:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:02 bking@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:01 bking@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:57 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
  • 14:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P77727 and previous config saved to /var/cache/conftool/dbconfig/20250611-145140-fceratto.json
  • 14:46 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2001.codfw.wmnet
  • 14:45 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts relforge[1003-1004].eqiad.wmnet
  • 14:44 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts relforge[1003-1004].eqiad.wmnet
  • 14:40 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1047.eqiad.wmnet
  • 14:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet
  • 14:39 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet
  • 14:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77726 and previous config saved to /var/cache/conftool/dbconfig/20250611-143633-fceratto.json
  • 14:34 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet
  • 14:34 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
  • 14:31 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:31 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77724 and previous config saved to /var/cache/conftool/dbconfig/20250611-142816-fceratto.json
  • 14:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 14:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T395241)', diff saved to https://phabricator.wikimedia.org/P77723 and previous config saved to /var/cache/conftool/dbconfig/20250611-142750-fceratto.json
  • 14:26 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:23 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:21 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:19 apine@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:18 apine@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:18 apine@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:18 apine@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:17 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:16 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:16 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:15 elukey@cumin1003: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:15 andrew-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 14:15 andrew-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 14:13 apine@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:13 apine@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:12 apine@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P77722 and previous config saved to /var/cache/conftool/dbconfig/20250611-141243-fceratto.json
  • 14:12 andrew-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 14:11 apine@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:11 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:10 andrew-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 14:10 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:08 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Set $wgPHPSessionHandling to 'disable' on testwiki and beta cluster (T362324), Stop logging $wgPHPSessionHandling warnings for now (T393963) (duration: 11m 14s)
  • 14:06 andrew-wmde@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 14:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10310
  • 14:04 andrew-wmde@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 14:01 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with sync
  • 14:00 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
  • 14:00 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 10310
  • 13:59 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for Set $wgPHPSessionHandling to 'disable' on testwiki and beta cluster (T362324), Stop logging $wgPHPSessionHandling warnings for now (T393963) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P77721 and previous config saved to /var/cache/conftool/dbconfig/20250611-135736-fceratto.json
  • 13:57 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
  • 13:57 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Set $wgPHPSessionHandling to 'disable' on testwiki and beta cluster (T362324), Stop logging $wgPHPSessionHandling warnings for now (T393963)
  • 13:53 esanders@deploy1003: Finished scap sync-world: Backport for Enable DiscussionTools visual enhancements everywhere except 12 wikis (T392121) (duration: 12m 36s)
  • 13:50 vgutierrez: upload varnish 7.1.1-2~bpo11+wmf2 to apt.wm.o (bullseye-wikimedia) - T396581
  • 13:48 kart_: Updated Recommnedation-API to 2025-06-10-203235-production (T374695)
  • 13:47 kartik@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:46 esanders@deploy1003: esanders: Continuing with sync
  • 13:45 hnowlan: migrating reading lists out of restbase for all wikis
  • 13:43 esanders@deploy1003: esanders: Backport for Enable DiscussionTools visual enhancements everywhere except 12 wikis (T392121) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:43 kartik@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T395241)', diff saved to https://phabricator.wikimedia.org/P77720 and previous config saved to /var/cache/conftool/dbconfig/20250611-134230-fceratto.json
  • 13:41 esanders@deploy1003: Started scap sync-world: Backport for Enable DiscussionTools visual enhancements everywhere except 12 wikis (T392121)
  • 13:39 kartik@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:38 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Update searchsuggest message key (T396219) (duration: 09m 57s)
  • 13:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T395241)', diff saved to https://phabricator.wikimedia.org/P77719 and previous config saved to /var/cache/conftool/dbconfig/20250611-133420-fceratto.json
  • 13:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 13:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T395241)', diff saved to https://phabricator.wikimedia.org/P77718 and previous config saved to /var/cache/conftool/dbconfig/20250611-133355-fceratto.json
  • 13:31 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
  • 13:31 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Update searchsuggest message key (T396219) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:28 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Update searchsuggest message key (T396219)
  • 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P77717 and previous config saved to /var/cache/conftool/dbconfig/20250611-131848-fceratto.json
  • 13:14 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for SUL3: Enable client hints data on the auth shared domain (T395185) (duration: 11m 09s)
  • 13:11 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
  • 13:07 lucaswerkmeister-wmde@deploy1003: d3r1ck01, lucaswerkmeister-wmde: Continuing with sync
  • 13:05 lucaswerkmeister-wmde@deploy1003: d3r1ck01, lucaswerkmeister-wmde: Backport for SUL3: Enable client hints data on the auth shared domain (T395185) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P77716 and previous config saved to /var/cache/conftool/dbconfig/20250611-130341-fceratto.json
  • 13:03 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
  • 13:03 akosiaris: T393557 block requests to /api/rest_v1/page/data-parsoid
  • 13:03 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for SUL3: Enable client hints data on the auth shared domain (T395185)
  • 13:00 XioNoX: disable lvs6002 secondary link switch port - T367731
  • 12:58 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
  • 12:54 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
  • 12:54 XioNoX: disable lvs3008 secondary link switch port - T367731
  • 12:54 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
  • 12:51 XioNoX: disable lvs3009 secondary link switch port - T367731
  • 12:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T395241)', diff saved to https://phabricator.wikimedia.org/P77715 and previous config saved to /var/cache/conftool/dbconfig/20250611-124834-fceratto.json
  • 12:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:47 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
  • 12:47 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
  • 12:43 XioNoX: disable lvs7001 secondary link switch port - T367731
  • 12:41 XioNoX: disable lvs7002 secondary link switch port - T367731
  • 12:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T396130)', diff saved to https://phabricator.wikimedia.org/P77714 and previous config saved to /var/cache/conftool/dbconfig/20250611-123753-marostegui.json
  • 12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T395241)', diff saved to https://phabricator.wikimedia.org/P77713 and previous config saved to /var/cache/conftool/dbconfig/20250611-123727-fceratto.json
  • 12:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77712 and previous config saved to /var/cache/conftool/dbconfig/20250611-123702-fceratto.json
  • 12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P77711 and previous config saved to /var/cache/conftool/dbconfig/20250611-122246-marostegui.json
  • 12:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P77710 and previous config saved to /var/cache/conftool/dbconfig/20250611-122155-fceratto.json
  • 12:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P77709 and previous config saved to /var/cache/conftool/dbconfig/20250611-120740-marostegui.json
  • 12:07 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
  • 12:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P77708 and previous config saved to /var/cache/conftool/dbconfig/20250611-120648-fceratto.json
  • 12:01 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1046.eqiad.wmnet
  • 12:01 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
  • 11:56 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
  • 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T396130)', diff saved to https://phabricator.wikimedia.org/P77707 and previous config saved to /var/cache/conftool/dbconfig/20250611-115231-marostegui.json
  • 11:51 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1046.eqiad.wmnet
  • 11:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77706 and previous config saved to /var/cache/conftool/dbconfig/20250611-115140-fceratto.json
  • 11:46 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 11:44 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77704 and previous config saved to /var/cache/conftool/dbconfig/20250611-114447-fceratto.json
  • 11:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 11:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T395241)', diff saved to https://phabricator.wikimedia.org/P77703 and previous config saved to /var/cache/conftool/dbconfig/20250611-114422-fceratto.json
  • 11:43 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1045.eqiad.wmnet
  • 11:43 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet
  • 11:42 XioNoX: disable lvs3010 secondary link switch port - T367731
  • 11:41 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
  • 11:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:39 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 11:38 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:38 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet
  • 11:35 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1045.eqiad.wmnet
  • 11:35 jmm@dns1004: END - running authdns-update
  • 11:34 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1044.eqiad.wmnet
  • 11:34 jmm@dns1004: START - running authdns-update
  • 11:34 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 11:34 klausman@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ml-serve1001.eqiad.wmnet
  • 11:34 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 11:34 XioNoX: disable lvs7003 secondary link switch port - T367731
  • 11:33 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2227 (T396130)', diff saved to https://phabricator.wikimedia.org/P77702 and previous config saved to /var/cache/conftool/dbconfig/20250611-113336-marostegui.json
  • 11:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2227.codfw.wmnet with reason: Maintenance
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T396130)', diff saved to https://phabricator.wikimedia.org/P77701 and previous config saved to /var/cache/conftool/dbconfig/20250611-113312-marostegui.json
  • 11:32 jmm@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir7003.magru.wmnet
  • 11:32 jmm@puppetserver1001: conftool action : set/weight=1; selector: name=ncredir7003.magru.wmnet
  • 11:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P77700 and previous config saved to /var/cache/conftool/dbconfig/20250611-112914-fceratto.json
  • 11:28 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
  • 11:28 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
  • 11:28 Ammar: Ran fixStuckGlobalRename.php for T396545
  • 11:24 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1044.eqiad.wmnet
  • 11:22 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1043.eqiad.wmnet
  • 11:22 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
  • 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P77699 and previous config saved to /var/cache/conftool/dbconfig/20250611-111805-marostegui.json
  • 11:17 moritzm: installing librabbitmq security updates
  • 11:16 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
  • 11:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 11:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P77698 and previous config saved to /var/cache/conftool/dbconfig/20250611-111407-fceratto.json
  • 11:13 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1043.eqiad.wmnet
  • 11:06 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:06 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1042.eqiad.wmnet
  • 11:06 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
  • 11:06 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:05 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:05 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:03 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P77697 and previous config saved to /var/cache/conftool/dbconfig/20250611-110257-marostegui.json
  • 11:02 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:02 brouberol@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:02 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:01 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
  • 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T395241)', diff saved to https://phabricator.wikimedia.org/P77696 and previous config saved to /var/cache/conftool/dbconfig/20250611-105900-fceratto.json
  • 10:57 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1042.eqiad.wmnet
  • 10:55 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
  • 10:55 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
  • 10:50 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
  • 10:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T395241)', diff saved to https://phabricator.wikimedia.org/P77695 and previous config saved to /var/cache/conftool/dbconfig/20250611-104825-fceratto.json
  • 10:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T396130)', diff saved to https://phabricator.wikimedia.org/P77694 and previous config saved to /var/cache/conftool/dbconfig/20250611-104750-marostegui.json
  • 10:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T395241)', diff saved to https://phabricator.wikimedia.org/P77693 and previous config saved to /var/cache/conftool/dbconfig/20250611-104741-fceratto.json
  • 10:46 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
  • 10:45 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1040.eqiad.wmnet
  • 10:45 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
  • 10:40 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
  • 10:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P77692 and previous config saved to /var/cache/conftool/dbconfig/20250611-103234-fceratto.json
  • 10:32 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1040.eqiad.wmnet
  • 10:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:29 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
  • 10:29 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T396130)', diff saved to https://phabricator.wikimedia.org/P77691 and previous config saved to /var/cache/conftool/dbconfig/20250611-102902-marostegui.json
  • 10:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T396130)', diff saved to https://phabricator.wikimedia.org/P77690 and previous config saved to /var/cache/conftool/dbconfig/20250611-102839-marostegui.json
  • 10:23 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
  • 10:20 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
  • 10:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P77689 and previous config saved to /var/cache/conftool/dbconfig/20250611-101727-fceratto.json
  • 10:15 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
  • 10:15 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
  • 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P77688 and previous config saved to /var/cache/conftool/dbconfig/20250611-101332-marostegui.json
  • 10:09 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:07 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
  • 10:06 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
  • 10:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T395241)', diff saved to https://phabricator.wikimedia.org/P77687 and previous config saved to /var/cache/conftool/dbconfig/20250611-100220-fceratto.json
  • 10:00 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
  • 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P77686 and previous config saved to /var/cache/conftool/dbconfig/20250611-095825-marostegui.json
  • 09:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:56 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
  • 09:55 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
  • 09:53 vgutierrez: restarting varnish on cp5018 to clear VarnishChildRestarted alert
  • 09:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T395241)', diff saved to https://phabricator.wikimedia.org/P77684 and previous config saved to /var/cache/conftool/dbconfig/20250611-095139-fceratto.json
  • 09:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 09:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T395241)', diff saved to https://phabricator.wikimedia.org/P77683 and previous config saved to /var/cache/conftool/dbconfig/20250611-095113-fceratto.json
  • 09:48 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
  • 09:44 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
  • 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T396130)', diff saved to https://phabricator.wikimedia.org/P77682 and previous config saved to /var/cache/conftool/dbconfig/20250611-094319-marostegui.json
  • 09:40 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
  • 09:38 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
  • 09:37 vgutierrez: use Google Trust Services (GTS) unified TLS certificate on magru - T395131
  • 09:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P77681 and previous config saved to /var/cache/conftool/dbconfig/20250611-093606-fceratto.json
  • 09:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T396130)', diff saved to https://phabricator.wikimedia.org/P77680 and previous config saved to /var/cache/conftool/dbconfig/20250611-092518-marostegui.json
  • 09:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 09:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T396130)', diff saved to https://phabricator.wikimedia.org/P77679 and previous config saved to /var/cache/conftool/dbconfig/20250611-092457-marostegui.json
  • 09:24 brouberol@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
  • 09:23 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-db1002 to dse-k8s-worker1013
  • 09:22 brouberol@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1013
  • 09:21 brouberol@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1013
  • 09:21 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1013 on all recursors
  • 09:21 brouberol@cumin2002: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1013 on all recursors
  • 09:21 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:21 brouberol@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-db1002 to dse-k8s-worker1013 - brouberol@cumin2002"
  • 09:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P77678 and previous config saved to /var/cache/conftool/dbconfig/20250611-092059-fceratto.json
  • 09:20 brouberol@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-db1002 to dse-k8s-worker1013 - brouberol@cumin2002"
  • 09:19 elukey: repool eqiad for inference.discovery.wmnet - was left depooled after a long maintenance for k8s infra changes a week ago
  • 09:18 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=inference,name=eqiad
  • 09:17 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
  • 09:17 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
  • 09:15 brouberol@cumin2002: START - Cookbook sre.dns.netbox
  • 09:14 brouberol@cumin2002: START - Cookbook sre.hosts.rename from an-db1002 to dse-k8s-worker1013
  • 09:12 moritzm: installing libfile-find-rule-perl security updates
  • 09:11 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
  • 09:11 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • 09:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P77674 and previous config saved to /var/cache/conftool/dbconfig/20250611-090949-marostegui.json
  • 09:09 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
  • 09:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T395241)', diff saved to https://phabricator.wikimedia.org/P77673 and previous config saved to /var/cache/conftool/dbconfig/20250611-090552-fceratto.json
  • 09:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:00 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-staging-worker
  • 08:59 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
  • 08:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T395241)', diff saved to https://phabricator.wikimedia.org/P77672 and previous config saved to /var/cache/conftool/dbconfig/20250611-085615-fceratto.json
  • 08:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 08:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T395241)', diff saved to https://phabricator.wikimedia.org/P77671 and previous config saved to /var/cache/conftool/dbconfig/20250611-085552-fceratto.json
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P77670 and previous config saved to /var/cache/conftool/dbconfig/20250611-085442-marostegui.json
  • 08:54 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
  • 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow1003.eqiad.wmnet
  • 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow1003.eqiad.wmnet with OS bookworm
  • 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
  • 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
  • 08:51 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
  • 08:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
  • 08:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P77669 and previous config saved to /var/cache/conftool/dbconfig/20250611-084045-fceratto.json
  • 08:39 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
  • 08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T396130)', diff saved to https://phabricator.wikimedia.org/P77668 and previous config saved to /var/cache/conftool/dbconfig/20250611-083935-marostegui.json
  • 08:37 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
  • 08:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
  • 08:35 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host dse-k8s-worker1012
  • 08:35 brouberol@cumin2002: START - Cookbook sre.hosts.move-vlan for host dse-k8s-worker1012
  • 08:35 brouberol@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
  • 08:33 tappof: T395240 May 2025 Bookworm reboots: alert2002.wikimedia.org
  • 08:32 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-db1001 to dse-k8s-worker1012
  • 08:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow1003.eqiad.wmnet with reason: host reimage
  • 08:32 brouberol@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1012
  • 08:32 tappof@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
  • 08:32 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
  • 08:30 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
  • 08:30 brouberol@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1012
  • 08:30 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1012 on all recursors
  • 08:30 brouberol@cumin2002: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1012 on all recursors
  • 08:30 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:30 brouberol@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-db1001 to dse-k8s-worker1012 - brouberol@cumin2002"
  • 08:30 brouberol@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-db1001 to dse-k8s-worker1012 - brouberol@cumin2002"
  • 08:27 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow1003.eqiad.wmnet with reason: host reimage
  • 08:27 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
  • 08:26 brouberol@cumin2002: START - Cookbook sre.dns.netbox
  • 08:26 brouberol@cumin2002: START - Cookbook sre.hosts.rename from an-db1001 to dse-k8s-worker1012
  • 08:26 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 08:25 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
  • 08:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P77667 and previous config saved to /var/cache/conftool/dbconfig/20250611-082538-fceratto.json
  • 08:22 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
  • 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T396130)', diff saved to https://phabricator.wikimedia.org/P77666 and previous config saved to /var/cache/conftool/dbconfig/20250611-082039-marostegui.json
  • 08:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 08:20 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
  • 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T396130)', diff saved to https://phabricator.wikimedia.org/P77665 and previous config saved to /var/cache/conftool/dbconfig/20250611-082018-marostegui.json
  • 08:19 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
  • 08:18 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
  • 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
  • 08:15 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
  • 08:15 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
  • 08:14 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
  • 08:13 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host netflow1003.eqiad.wmnet with OS bookworm
  • 08:12 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
  • 08:11 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
  • 08:10 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
  • 08:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow1003.eqiad.wmnet - ayounsi@cumin1003"
  • 08:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T395241)', diff saved to https://phabricator.wikimedia.org/P77664 and previous config saved to /var/cache/conftool/dbconfig/20250611-081031-fceratto.json
  • 08:10 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow1003.eqiad.wmnet - ayounsi@cumin1003"
  • 08:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow1003.eqiad.wmnet on all recursors
  • 08:10 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache netflow1003.eqiad.wmnet on all recursors
  • 08:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow1003.eqiad.wmnet - ayounsi@cumin1003"
  • 08:09 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
  • 08:09 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow1003.eqiad.wmnet - ayounsi@cumin1003"
  • 08:07 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
  • 08:07 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
  • 08:07 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
  • 08:05 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 08:05 ayounsi@cumin1003: START - Cookbook sre.ganeti.makevm for new host netflow1003.eqiad.wmnet
  • 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P77662 and previous config saved to /var/cache/conftool/dbconfig/20250611-080511-marostegui.json
  • 08:04 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 08:03 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
  • 08:03 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
  • 08:01 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T395241)', diff saved to https://phabricator.wikimedia.org/P77661 and previous config saved to /var/cache/conftool/dbconfig/20250611-080101-fceratto.json
  • 08:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 07:59 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
  • 07:59 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
  • 07:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
  • 07:57 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:56 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
  • 07:53 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:53 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
  • 07:52 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77660 and previous config saved to /var/cache/conftool/dbconfig/20250611-075240-root.json
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P77659 and previous config saved to /var/cache/conftool/dbconfig/20250611-075004-marostegui.json
  • 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77658 and previous config saved to /var/cache/conftool/dbconfig/20250611-073733-root.json
  • 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'es2028 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77657 and previous config saved to /var/cache/conftool/dbconfig/20250611-073530-root.json
  • 07:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T396130)', diff saved to https://phabricator.wikimedia.org/P77656 and previous config saved to /var/cache/conftool/dbconfig/20250611-073457-marostegui.json
  • 07:33 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
  • 07:33 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1031.eqiad.wmnet
  • 07:31 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
  • 07:31 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1031.eqiad.wmnet
  • 07:27 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
  • 07:24 slyngshede@dns1004: END - running authdns-update
  • 07:24 slyngshede@dns1004: START - running authdns-update
  • 07:22 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77655 and previous config saved to /var/cache/conftool/dbconfig/20250611-072227-root.json
  • 07:22 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'es2028 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77654 and previous config saved to /var/cache/conftool/dbconfig/20250611-072024-root.json
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T396130)', diff saved to https://phabricator.wikimedia.org/P77653 and previous config saved to /var/cache/conftool/dbconfig/20250611-071612-marostegui.json
  • 07:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77652 and previous config saved to /var/cache/conftool/dbconfig/20250611-071549-marostegui.json
  • 07:10 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
  • 07:09 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
  • 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77651 and previous config saved to /var/cache/conftool/dbconfig/20250611-070722-root.json
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'es2028 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77650 and previous config saved to /var/cache/conftool/dbconfig/20250611-070519-root.json
  • 07:03 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77649 and previous config saved to /var/cache/conftool/dbconfig/20250611-070117-root.json
  • 07:00 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P77648 and previous config saved to /var/cache/conftool/dbconfig/20250611-070042-marostegui.json
  • 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77647 and previous config saved to /var/cache/conftool/dbconfig/20250611-065217-root.json
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'es2028 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77646 and previous config saved to /var/cache/conftool/dbconfig/20250611-065013-root.json
  • 06:49 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 06:49 jmm@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=1) rolling restart_daemons on A:wdqs-all
  • 06:48 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
  • 06:48 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 06:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2027.codfw.wmnet with reason: Maintenance
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77645 and previous config saved to /var/cache/conftool/dbconfig/20250611-064611-root.json
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2027 T395241', diff saved to https://phabricator.wikimedia.org/P77644 and previous config saved to /var/cache/conftool/dbconfig/20250611-064606-marostegui.json
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P77643 and previous config saved to /var/cache/conftool/dbconfig/20250611-064535-marostegui.json
  • 06:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2028.codfw.wmnet with reason: Maintenance
  • 06:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2028 T395241', diff saved to https://phabricator.wikimedia.org/P77642 and previous config saved to /var/cache/conftool/dbconfig/20250611-064314-marostegui.json
  • 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2229 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77641 and previous config saved to /var/cache/conftool/dbconfig/20250611-064246-root.json
  • 06:42 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 06:38 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2238 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77640 and previous config saved to /var/cache/conftool/dbconfig/20250611-063549-root.json
  • 06:32 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
  • 06:32 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77639 and previous config saved to /var/cache/conftool/dbconfig/20250611-063059-root.json
  • 06:30 marostegui@deploy1003: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes in es7" (duration: 10m 06s)
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77638 and previous config saved to /var/cache/conftool/dbconfig/20250611-063027-marostegui.json
  • 06:30 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 06:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2229 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77637 and previous config saved to /var/cache/conftool/dbconfig/20250611-062741-root.json
  • 06:25 moritzm: installing libxml2 security updates
  • 06:23 marostegui@deploy1003: marostegui: Continuing with sync
  • 06:22 marostegui@deploy1003: marostegui: Backport for Revert "db-production.php: Disable writes in es7" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2238 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77636 and previous config saved to /var/cache/conftool/dbconfig/20250611-062044-root.json
  • 06:20 marostegui@deploy1003: Started scap sync-world: Backport for Revert "db-production.php: Disable writes in es7"
  • 06:19 marostegui: Starting es7 eqiad failover from es1039 to es1035 - T396550
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Pool es1039', diff saved to https://phabricator.wikimedia.org/P77635 and previous config saved to /var/cache/conftool/dbconfig/20250611-061901-marostegui.json
  • 06:18 marostegui@dns1006: END - running authdns-update
  • 06:17 marostegui@dns1006: START - running authdns-update
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1035 to es7 primary T396550', diff saved to https://phabricator.wikimedia.org/P77634 and previous config saved to /var/cache/conftool/dbconfig/20250611-061644-root.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77633 and previous config saved to /var/cache/conftool/dbconfig/20250611-061553-root.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1035 with weight 0 T396550', diff saved to https://phabricator.wikimedia.org/P77632 and previous config saved to /var/cache/conftool/dbconfig/20250611-061501-root.json
  • 06:14 marostegui@deploy1003: Finished scap sync-world: Backport for db-production.php: Disable writes in es7 (T396550) (duration: 10m 03s)
  • 06:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77631 and previous config saved to /var/cache/conftool/dbconfig/20250611-061242-marostegui.json
  • 06:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2229 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77630 and previous config saved to /var/cache/conftool/dbconfig/20250611-061236-root.json
  • 06:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 06:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T396130)', diff saved to https://phabricator.wikimedia.org/P77629 and previous config saved to /var/cache/conftool/dbconfig/20250611-061219-marostegui.json
  • 06:07 marostegui@deploy1003: marostegui: Continuing with sync
  • 06:06 marostegui@deploy1003: marostegui: Backport for db-production.php: Disable writes in es7 (T396550) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repool es1040', diff saved to https://phabricator.wikimedia.org/P77628 and previous config saved to /var/cache/conftool/dbconfig/20250611-060552-marostegui.json
  • 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2238 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77627 and previous config saved to /var/cache/conftool/dbconfig/20250611-060538-root.json
  • 06:04 marostegui@deploy1003: Started scap sync-world: Backport for db-production.php: Disable writes in es7 (T396550)
  • 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repool es1040', diff saved to https://phabricator.wikimedia.org/P77626 and previous config saved to /var/cache/conftool/dbconfig/20250611-060413-marostegui.json
  • 06:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T396550
  • 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repool es1040', diff saved to https://phabricator.wikimedia.org/P77625 and previous config saved to /var/cache/conftool/dbconfig/20250611-060227-marostegui.json
  • 06:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77624 and previous config saved to /var/cache/conftool/dbconfig/20250611-060048-root.json
  • 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2229 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77623 and previous config saved to /var/cache/conftool/dbconfig/20250611-055730-root.json
  • 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P77622 and previous config saved to /var/cache/conftool/dbconfig/20250611-055711-marostegui.json
  • 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77621 and previous config saved to /var/cache/conftool/dbconfig/20250611-055705-root.json
  • 05:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1233 T396549', diff saved to https://phabricator.wikimedia.org/P77620 and previous config saved to /var/cache/conftool/dbconfig/20250611-055222-marostegui.json
  • 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2238 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77619 and previous config saved to /var/cache/conftool/dbconfig/20250611-055033-root.json
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77618 and previous config saved to /var/cache/conftool/dbconfig/20250611-054835-root.json
  • 05:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2229 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77617 and previous config saved to /var/cache/conftool/dbconfig/20250611-054224-root.json
  • 05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P77616 and previous config saved to /var/cache/conftool/dbconfig/20250611-054204-marostegui.json
  • 05:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1040.eqiad.wmnet with reason: Maintenance
  • 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1040', diff saved to https://phabricator.wikimedia.org/P77615 and previous config saved to /var/cache/conftool/dbconfig/20250611-053903-marostegui.json
  • 05:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2238 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77614 and previous config saved to /var/cache/conftool/dbconfig/20250611-053527-root.json
  • 05:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2238.codfw.wmnet with reason: Maintenance
  • 05:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2238 T396549', diff saved to https://phabricator.wikimedia.org/P77613 and previous config saved to /var/cache/conftool/dbconfig/20250611-052907-marostegui.json
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2229 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77612 and previous config saved to /var/cache/conftool/dbconfig/20250611-052719-root.json
  • 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T396130)', diff saved to https://phabricator.wikimedia.org/P77611 and previous config saved to /var/cache/conftool/dbconfig/20250611-052657-marostegui.json
  • 05:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2229 T396509', diff saved to https://phabricator.wikimedia.org/P77610 and previous config saved to /var/cache/conftool/dbconfig/20250611-051612-marostegui.json
  • 05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2214 to s6 primary T396509', diff saved to https://phabricator.wikimedia.org/P77609 and previous config saved to /var/cache/conftool/dbconfig/20250611-051525-marostegui.json
  • 05:15 marostegui: Starting s6 codfw failover from db2229 to db2214 - T396509
  • 05:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s6 T396509
  • 05:10 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2214 with weight 0 T396509', diff saved to https://phabricator.wikimedia.org/P77608 and previous config saved to /var/cache/conftool/dbconfig/20250611-051056-root.json
  • 05:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T396130)', diff saved to https://phabricator.wikimedia.org/P77607 and previous config saved to /var/cache/conftool/dbconfig/20250611-050911-marostegui.json
  • 05:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 05:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 04:37 oblivian@deploy1003: Finished scap sync-world: Backport for robots.txt: add crawl-delay directive for semrushbot (duration: 11m 43s)
  • 04:30 oblivian@deploy1003: oblivian: Continuing with sync
  • 04:28 oblivian@deploy1003: oblivian: Backport for robots.txt: add crawl-delay directive for semrushbot synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 04:25 oblivian@deploy1003: Started scap sync-world: Backport for robots.txt: add crawl-delay directive for semrushbot
  • 02:07 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 02:06 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 02:06 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 02:06 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 02:06 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 02:06 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 00:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T396130)', diff saved to https://phabricator.wikimedia.org/P77606 and previous config saved to /var/cache/conftool/dbconfig/20250611-001949-marostegui.json
  • 00:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P77605 and previous config saved to /var/cache/conftool/dbconfig/20250611-000441-marostegui.json

2025-06-10

  • 23:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P77604 and previous config saved to /var/cache/conftool/dbconfig/20250610-234934-marostegui.json
  • 23:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T396130)', diff saved to https://phabricator.wikimedia.org/P77603 and previous config saved to /var/cache/conftool/dbconfig/20250610-233427-marostegui.json
  • 23:25 krinkle@deploy1003: Finished scap sync-world: Backport for multiversion: Document how it all works (T289318) (duration: 12m 56s)
  • 23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T396130)', diff saved to https://phabricator.wikimedia.org/P77602 and previous config saved to /var/cache/conftool/dbconfig/20250610-232206-marostegui.json
  • 23:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1254.eqiad.wmnet with reason: Maintenance
  • 23:18 krinkle@deploy1003: krinkle: Continuing with sync
  • 23:14 krinkle@deploy1003: krinkle: Backport for multiversion: Document how it all works (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:12 krinkle@deploy1003: Started scap sync-world: Backport for multiversion: Document how it all works (T289318)
  • 23:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 23:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T396130)', diff saved to https://phabricator.wikimedia.org/P77600 and previous config saved to /var/cache/conftool/dbconfig/20250610-231053-marostegui.json
  • 22:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P77599 and previous config saved to /var/cache/conftool/dbconfig/20250610-225546-marostegui.json
  • 22:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P77598 and previous config saved to /var/cache/conftool/dbconfig/20250610-224039-marostegui.json
  • 22:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T396130)', diff saved to https://phabricator.wikimedia.org/P77597 and previous config saved to /var/cache/conftool/dbconfig/20250610-222532-marostegui.json
  • 22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T396130)', diff saved to https://phabricator.wikimedia.org/P77596 and previous config saved to /var/cache/conftool/dbconfig/20250610-221311-marostegui.json
  • 22:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 22:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T396130)', diff saved to https://phabricator.wikimedia.org/P77595 and previous config saved to /var/cache/conftool/dbconfig/20250610-221248-marostegui.json
  • 21:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P77594 and previous config saved to /var/cache/conftool/dbconfig/20250610-215741-marostegui.json
  • 21:51 catrope@deploy1003: Finished scap sync-world: Backport for Fixes TypeError: undefined is not an object (evaluating 'sources.map') (T396370), Fixes TypeError: undefined is not an object (evaluating 'sources.map') (T396370) (duration: 11m 20s)
  • 21:44 catrope@deploy1003: catrope: Continuing with sync
  • 21:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P77593 and previous config saved to /var/cache/conftool/dbconfig/20250610-214234-marostegui.json
  • 21:42 catrope@deploy1003: catrope: Backport for Fixes TypeError: undefined is not an object (evaluating 'sources.map') (T396370), Fixes TypeError: undefined is not an object (evaluating 'sources.map') (T396370) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:39 catrope@deploy1003: Started scap sync-world: Backport for Fixes TypeError: undefined is not an object (evaluating 'sources.map') (T396370), Fixes TypeError: undefined is not an object (evaluating 'sources.map') (T396370)
  • 21:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T396130)', diff saved to https://phabricator.wikimedia.org/P77592 and previous config saved to /var/cache/conftool/dbconfig/20250610-212727-marostegui.json
  • 21:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T396130)', diff saved to https://phabricator.wikimedia.org/P77591 and previous config saved to /var/cache/conftool/dbconfig/20250610-212332-marostegui.json
  • 21:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 21:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 21:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T396130)', diff saved to https://phabricator.wikimedia.org/P77590 and previous config saved to /var/cache/conftool/dbconfig/20250610-211234-marostegui.json
  • 20:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P77588 and previous config saved to /var/cache/conftool/dbconfig/20250610-205727-marostegui.json
  • 20:56 bking@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:55 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 20:55 bking@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:55 bking@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:55 bking@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:55 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 20:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P77587 and previous config saved to /var/cache/conftool/dbconfig/20250610-204220-marostegui.json
  • 20:32 cjming@deploy1003: Finished scap sync-world: Backport for Replace deprecated wgCirrusSearchWMFExtraFeatures with wgCirrusSearchWeightedTags (T393872) (duration: 10m 18s)
  • 20:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T396130)', diff saved to https://phabricator.wikimedia.org/P77586 and previous config saved to /var/cache/conftool/dbconfig/20250610-202713-marostegui.json
  • 20:26 cjming@deploy1003: cjming, sd: Continuing with sync
  • 20:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:24 cjming@deploy1003: cjming, sd: Backport for Replace deprecated wgCirrusSearchWMFExtraFeatures with wgCirrusSearchWeightedTags (T393872) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:24 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:24 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1185 - vriley@cumin1002"
  • 20:24 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1185 - vriley@cumin1002"
  • 20:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 20:22 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 20:22 cjming@deploy1003: Started scap sync-world: Backport for Replace deprecated wgCirrusSearchWMFExtraFeatures with wgCirrusSearchWeightedTags (T393872)
  • 20:20 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 20:19 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:19 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1185 - vriley@cumin1002"
  • 20:19 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1185 - vriley@cumin1002"
  • 20:19 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 20:18 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 20:16 toyofuku@deploy1003: Finished scap sync-world: Backport for Enable empty search recommendations for Vector on all wikipedias, and for Minerva on group1 wikis and wikivoyage (T395344 T395339) (duration: 13m 01s)
  • 20:16 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 20:15 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 20:15 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 20:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T396130)', diff saved to https://phabricator.wikimedia.org/P77585 and previous config saved to /var/cache/conftool/dbconfig/20250610-201441-marostegui.json
  • 20:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 20:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T396130)', diff saved to https://phabricator.wikimedia.org/P77584 and previous config saved to /var/cache/conftool/dbconfig/20250610-201418-marostegui.json
  • 20:11 vriley@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1185
  • 20:11 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1185
  • 20:11 vriley@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1185
  • 20:10 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1185
  • 20:09 toyofuku@deploy1003: bwang, toyofuku: Continuing with sync
  • 20:06 toyofuku@deploy1003: bwang, toyofuku: Backport for Enable empty search recommendations for Vector on all wikipedias, and for Minerva on group1 wikis and wikivoyage (T395344 T395339) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for Enable empty search recommendations for Vector on all wikipedias, and for Minerva on group1 wikis and wikivoyage (T395344 T395339)
  • 19:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P77583 and previous config saved to /var/cache/conftool/dbconfig/20250610-195910-marostegui.json
  • 19:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P77582 and previous config saved to /var/cache/conftool/dbconfig/20250610-194403-marostegui.json
  • 19:41 dwisehaupt@dns1004: END - running authdns-update
  • 19:40 dwisehaupt@dns1004: START - running authdns-update
  • 19:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T396130)', diff saved to https://phabricator.wikimedia.org/P77581 and previous config saved to /var/cache/conftool/dbconfig/20250610-192856-marostegui.json
  • 19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T396130)', diff saved to https://phabricator.wikimedia.org/P77580 and previous config saved to /var/cache/conftool/dbconfig/20250610-192503-marostegui.json
  • 19:24 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 19:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T396130)', diff saved to https://phabricator.wikimedia.org/P77579 and previous config saved to /var/cache/conftool/dbconfig/20250610-192441-marostegui.json
  • 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P77578 and previous config saved to /var/cache/conftool/dbconfig/20250610-190934-marostegui.json
  • 18:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P77577 and previous config saved to /var/cache/conftool/dbconfig/20250610-185426-marostegui.json
  • 18:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T396130)', diff saved to https://phabricator.wikimedia.org/P77576 and previous config saved to /var/cache/conftool/dbconfig/20250610-183919-marostegui.json
  • 18:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T396130)', diff saved to https://phabricator.wikimedia.org/P77575 and previous config saved to /var/cache/conftool/dbconfig/20250610-183528-marostegui.json
  • 18:35 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 18:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T396130)', diff saved to https://phabricator.wikimedia.org/P77574 and previous config saved to /var/cache/conftool/dbconfig/20250610-183505-marostegui.json
  • 18:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P77573 and previous config saved to /var/cache/conftool/dbconfig/20250610-181958-marostegui.json
  • 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.5 refs T392175
  • 18:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P77572 and previous config saved to /var/cache/conftool/dbconfig/20250610-180451-marostegui.json
  • 18:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77571 and previous config saved to /var/cache/conftool/dbconfig/20250610-180333-root.json
  • 17:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T396130)', diff saved to https://phabricator.wikimedia.org/P77570 and previous config saved to /var/cache/conftool/dbconfig/20250610-174944-marostegui.json
  • 17:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77569 and previous config saved to /var/cache/conftool/dbconfig/20250610-174828-root.json
  • 17:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T396130)', diff saved to https://phabricator.wikimedia.org/P77568 and previous config saved to /var/cache/conftool/dbconfig/20250610-173514-marostegui.json
  • 17:35 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77567 and previous config saved to /var/cache/conftool/dbconfig/20250610-173450-marostegui.json
  • 17:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77566 and previous config saved to /var/cache/conftool/dbconfig/20250610-173322-root.json
  • 17:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 17:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P77565 and previous config saved to /var/cache/conftool/dbconfig/20250610-171943-marostegui.json
  • 17:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77564 and previous config saved to /var/cache/conftool/dbconfig/20250610-171817-root.json
  • 17:14 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 17:10 mszabo@deploy1003: Finished scap sync-world: Backport for Set ORESDeveloperSetup to false by default (T364705), ores: Disable AbuseFilter integration by default (T364705), tests: Run only defered updates on LinkRecommendationUpdaterTest (duration: 15m 06s)
  • 17:09 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 17:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T395241)', diff saved to https://phabricator.wikimedia.org/P77563 and previous config saved to /var/cache/conftool/dbconfig/20250610-170543-fceratto.json
  • 17:04 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 17:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P77562 and previous config saved to /var/cache/conftool/dbconfig/20250610-170437-marostegui.json
  • 17:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77561 and previous config saved to /var/cache/conftool/dbconfig/20250610-170312-root.json
  • 17:01 mszabo@deploy1003: mszabo: Continuing with sync
  • 16:59 mszabo@deploy1003: mszabo: Backport for Set ORESDeveloperSetup to false by default (T364705), ores: Disable AbuseFilter integration by default (T364705), tests: Run only defered updates on LinkRecommendationUpdaterTest synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:55 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:55 mszabo@deploy1003: Started scap sync-world: Backport for Set ORESDeveloperSetup to false by default (T364705), ores: Disable AbuseFilter integration by default (T364705), tests: Run only defered updates on LinkRecommendationUpdaterTest
  • 16:54 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:52 mszabo@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.45.0-wmf.4,1.45.0-wmf.5 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.w
  • 16:51 mszabo@deploy1003: Started scap sync-world: Backport for Set ORESDeveloperSetup to false by default (T364705), ores: Disable AbuseFilter integration by default (T364705), tests: Run only defered updates on LinkRecommendationUpdaterTest
  • 16:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P77560 and previous config saved to /var/cache/conftool/dbconfig/20250610-165036-fceratto.json
  • 16:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77559 and previous config saved to /var/cache/conftool/dbconfig/20250610-164930-marostegui.json
  • 16:49 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: relforge1003*,relforge1004* for testtesttest - bking@cumin2002 - T390565
  • 16:49 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1003*,relforge1004* for testtesttest - bking@cumin2002 - T390565
  • 16:48 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: relforge1003*relforg1004* for testtesttest - bking@cumin2002 - T390565
  • 16:48 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1003*relforg1004* for testtesttest - bking@cumin2002 - T390565
  • 16:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77558 and previous config saved to /var/cache/conftool/dbconfig/20250610-164806-root.json
  • 16:42 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:38 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1201 T395989', diff saved to https://phabricator.wikimedia.org/P77557 and previous config saved to /var/cache/conftool/dbconfig/20250610-163742-marostegui.json
  • 16:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P77556 and previous config saved to /var/cache/conftool/dbconfig/20250610-163529-fceratto.json
  • 16:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77555 and previous config saved to /var/cache/conftool/dbconfig/20250610-163458-marostegui.json
  • 16:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 16:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 16:21 dancy@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.5 refs T392175 (duration: 44m 02s)
  • 16:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T395241)', diff saved to https://phabricator.wikimedia.org/P77554 and previous config saved to /var/cache/conftool/dbconfig/20250610-162022-fceratto.json
  • 16:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1253 (T395241)', diff saved to https://phabricator.wikimedia.org/P77553 and previous config saved to /var/cache/conftool/dbconfig/20250610-161323-fceratto.json
  • 16:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1253.eqiad.wmnet with reason: Maintenance
  • 16:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T395241)', diff saved to https://phabricator.wikimedia.org/P77552 and previous config saved to /var/cache/conftool/dbconfig/20250610-161258-fceratto.json
  • 16:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 16:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T396130)', diff saved to https://phabricator.wikimedia.org/P77551 and previous config saved to /var/cache/conftool/dbconfig/20250610-160804-marostegui.json
  • 16:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install7002.wikimedia.org with OS bookworm
  • 15:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P77550 and previous config saved to /var/cache/conftool/dbconfig/20250610-155752-fceratto.json
  • 15:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P77549 and previous config saved to /var/cache/conftool/dbconfig/20250610-155257-marostegui.json
  • 15:47 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install7002.wikimedia.org with reason: host reimage
  • 15:43 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on install7002.wikimedia.org with reason: host reimage
  • 15:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P77548 and previous config saved to /var/cache/conftool/dbconfig/20250610-154245-fceratto.json
  • 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P77547 and previous config saved to /var/cache/conftool/dbconfig/20250610-153750-marostegui.json
  • 15:37 dancy@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.5 refs T392175
  • 15:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T395241)', diff saved to https://phabricator.wikimedia.org/P77546 and previous config saved to /var/cache/conftool/dbconfig/20250610-152738-fceratto.json
  • 15:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T396130)', diff saved to https://phabricator.wikimedia.org/P77545 and previous config saved to /var/cache/conftool/dbconfig/20250610-152243-marostegui.json
  • 15:14 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet
  • 15:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2238 (T396130)', diff saved to https://phabricator.wikimedia.org/P77544 and previous config saved to /var/cache/conftool/dbconfig/20250610-150954-marostegui.json
  • 15:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2238.codfw.wmnet with reason: Maintenance
  • 15:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T396130)', diff saved to https://phabricator.wikimedia.org/P77543 and previous config saved to /var/cache/conftool/dbconfig/20250610-150931-marostegui.json
  • 15:08 brennen@deploy1003: Finished deploy [phabricator/deployment@f8d7b38]: deploy phab1004 for T396490 (duration: 00m 39s)
  • 15:08 brennen@deploy1003: Started deploy [phabricator/deployment@f8d7b38]: deploy phab1004 for T396490
  • 15:08 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet
  • 15:07 brennen@deploy1003: Finished deploy [phabricator/deployment@f8d7b38]: test deploy phab2002 for T396490 (duration: 00m 40s)
  • 15:07 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1001.eqiad.wmnet
  • 15:07 brennen@deploy1003: Started deploy [phabricator/deployment@f8d7b38]: test deploy phab2002 for T396490
  • 15:07 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phorge Update
  • 15:05 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phorge Update
  • 15:02 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host install7002.wikimedia.org with OS bookworm
  • 15:02 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
  • 15:02 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
  • 15:01 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1001.eqiad.wmnet
  • 15:01 taavi@dns1004: END - running authdns-update
  • 15:00 taavi@dns1004: START - running authdns-update
  • 14:58 taavi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 taavi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add wiki replica cloudlb v6 addresses - taavi@cumin1003"
  • 14:58 taavi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add wiki replica cloudlb v6 addresses - taavi@cumin1003"
  • 14:56 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
  • 14:55 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
  • 14:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P77542 and previous config saved to /var/cache/conftool/dbconfig/20250610-145424-marostegui.json
  • 14:54 taavi@cumin1003: START - Cookbook sre.dns.netbox
  • 14:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc1 T378715', diff saved to https://phabricator.wikimedia.org/P77541 and previous config saved to /var/cache/conftool/dbconfig/20250610-145137-marostegui.json
  • 14:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cirrussearch1063.eqiad.wmnet
  • 14:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:49 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch1063.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 14:49 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch1063.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P77539 and previous config saved to /var/cache/conftool/dbconfig/20250610-143917-marostegui.json
  • 14:36 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T395241)', diff saved to https://phabricator.wikimedia.org/P77538 and previous config saved to /var/cache/conftool/dbconfig/20250610-143623-fceratto.json
  • 14:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 14:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T395241)', diff saved to https://phabricator.wikimedia.org/P77537 and previous config saved to /var/cache/conftool/dbconfig/20250610-143558-fceratto.json
  • 14:29 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host install7002.wikimedia.org with OS bullseye
  • 14:28 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cirrussearch1063.eqiad.wmnet
  • 14:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T396130)', diff saved to https://phabricator.wikimedia.org/P77536 and previous config saved to /var/cache/conftool/dbconfig/20250610-142410-marostegui.json
  • 14:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P77535 and previous config saved to /var/cache/conftool/dbconfig/20250610-142051-fceratto.json
  • 14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2226 (T396130)', diff saved to https://phabricator.wikimedia.org/P77534 and previous config saved to /var/cache/conftool/dbconfig/20250610-142009-marostegui.json
  • 14:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2226.codfw.wmnet with reason: Maintenance
  • 14:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T396130)', diff saved to https://phabricator.wikimedia.org/P77533 and previous config saved to /var/cache/conftool/dbconfig/20250610-141946-marostegui.json
  • 14:19 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1028.eqiad.wmnet
  • 14:13 fabfur@dns1004: END - running authdns-update
  • 14:13 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
  • 14:13 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1028.eqiad.wmnet
  • 14:12 fabfur@dns1004: START - running authdns-update
  • 14:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P77532 and previous config saved to /var/cache/conftool/dbconfig/20250610-140544-fceratto.json
  • 14:04 taavi@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
  • 14:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P77531 and previous config saved to /var/cache/conftool/dbconfig/20250610-140439-marostegui.json
  • 13:57 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
  • 13:56 taavi@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
  • 13:55 taavi@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1002.eqiad.wmnet
  • 13:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 13:51 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 13:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 13:50 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 13:50 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
  • 13:50 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
  • 13:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T395241)', diff saved to https://phabricator.wikimedia.org/P77529 and previous config saved to /var/cache/conftool/dbconfig/20250610-135037-fceratto.json
  • 13:50 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:50 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:49 fabfur@dns1004: END - running authdns-update
  • 13:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P77528 and previous config saved to /var/cache/conftool/dbconfig/20250610-134931-marostegui.json
  • 13:48 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:48 fabfur@dns1004: START - running authdns-update
  • 13:48 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:47 taavi@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
  • 13:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
  • 13:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T395241)', diff saved to https://phabricator.wikimedia.org/P77527 and previous config saved to /var/cache/conftool/dbconfig/20250610-134227-fceratto.json
  • 13:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 13:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T395241)', diff saved to https://phabricator.wikimedia.org/P77526 and previous config saved to /var/cache/conftool/dbconfig/20250610-134202-fceratto.json
  • 13:39 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
  • 13:39 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
  • 13:38 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 13:38 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:36 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T396130)', diff saved to https://phabricator.wikimedia.org/P77525 and previous config saved to /var/cache/conftool/dbconfig/20250610-133424-marostegui.json
  • 13:34 sgimeno@deploy1003: Finished scap sync-world: Backport for Enable electionclerk user group on enwiki (T378287), core-Permissions:Restrict editing on cawikimedia to autoconfirmed only (T396178) (duration: 11m 22s)
  • 13:33 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 13:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 13:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc1 T378715', diff saved to https://phabricator.wikimedia.org/P77524 and previous config saved to /var/cache/conftool/dbconfig/20250610-133207-marostegui.json
  • 13:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2011.codfw.wmnet,pc1011.eqiad.wmnet with reason: Maintenance
  • 13:30 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
  • 13:27 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
  • 13:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 13:27 sgimeno@deploy1003: bunnypranav, dreamrimmer, sgimeno: Continuing with sync
  • 13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P77523 and previous config saved to /var/cache/conftool/dbconfig/20250610-132655-fceratto.json
  • 13:25 sgimeno@deploy1003: bunnypranav, dreamrimmer, sgimeno: Backport for Enable electionclerk user group on enwiki (T378287), core-Permissions:Restrict editing on cawikimedia to autoconfirmed only (T396178) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:22 sgimeno@deploy1003: Started scap sync-world: Backport for Enable electionclerk user group on enwiki (T378287), core-Permissions:Restrict editing on cawikimedia to autoconfirmed only (T396178)
  • 13:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2225 (T396130)', diff saved to https://phabricator.wikimedia.org/P77522 and previous config saved to /var/cache/conftool/dbconfig/20250610-132124-marostegui.json
  • 13:21 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 13:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2225.codfw.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T396130)', diff saved to https://phabricator.wikimedia.org/P77521 and previous config saved to /var/cache/conftool/dbconfig/20250610-132102-marostegui.json
  • 13:18 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host install7002.wikimedia.org with OS bullseye
  • 13:18 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 13:17 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host install7002.wikimedia.org with OS bullseye
  • 13:15 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
  • 13:15 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
  • 13:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P77520 and previous config saved to /var/cache/conftool/dbconfig/20250610-131148-fceratto.json
  • 13:09 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
  • 13:06 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
  • 13:06 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1024.eqiad.wmnet
  • 13:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P77519 and previous config saved to /var/cache/conftool/dbconfig/20250610-130555-marostegui.json
  • 13:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T395241)', diff saved to https://phabricator.wikimedia.org/P77518 and previous config saved to /var/cache/conftool/dbconfig/20250610-125641-fceratto.json
  • 12:54 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:54 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 12:54 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host install7002.wikimedia.org with OS bullseye
  • 12:53 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host install7002.wikimedia.org with OS bullseye
  • 12:53 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:53 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 12:53 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 12:52 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:52 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 12:52 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P77517 and previous config saved to /var/cache/conftool/dbconfig/20250610-125048-marostegui.json
  • 12:48 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host install7002.wikimedia.org with OS bullseye
  • 12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T395241)', diff saved to https://phabricator.wikimedia.org/P77516 and previous config saved to /var/cache/conftool/dbconfig/20250610-124835-fceratto.json
  • 12:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T395241)', diff saved to https://phabricator.wikimedia.org/P77515 and previous config saved to /var/cache/conftool/dbconfig/20250610-124810-fceratto.json
  • 12:47 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
  • 12:41 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
  • 12:41 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 12:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T396130)', diff saved to https://phabricator.wikimedia.org/P77514 and previous config saved to /var/cache/conftool/dbconfig/20250610-123541-marostegui.json
  • 12:35 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77513 and previous config saved to /var/cache/conftool/dbconfig/20250610-123422-root.json
  • 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P77512 and previous config saved to /var/cache/conftool/dbconfig/20250610-123303-fceratto.json
  • 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T396130)', diff saved to https://phabricator.wikimedia.org/P77511 and previous config saved to /var/cache/conftool/dbconfig/20250610-123140-marostegui.json
  • 12:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T396130)', diff saved to https://phabricator.wikimedia.org/P77510 and previous config saved to /var/cache/conftool/dbconfig/20250610-123117-marostegui.json
  • 12:27 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
  • 12:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) openstack.eqiad1.wikimediacloud.org on all recursors
  • 12:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache openstack.eqiad1.wikimediacloud.org on all recursors
  • 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77509 and previous config saved to /var/cache/conftool/dbconfig/20250610-121917-root.json
  • 12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P77508 and previous config saved to /var/cache/conftool/dbconfig/20250610-121756-fceratto.json
  • 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P77507 and previous config saved to /var/cache/conftool/dbconfig/20250610-121610-marostegui.json
  • 12:15 Ammar: Ran fixStuckGlobalRename.php for T396371 and T396452
  • 12:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:13 taavi@dns1004: END - running authdns-update
  • 12:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:12 taavi@dns1004: START - running authdns-update
  • 12:11 taavi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:11 taavi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add AAAA record for openstack.eqiad1.wikimediacloud.org - taavi@cumin1003"
  • 12:10 taavi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add AAAA record for openstack.eqiad1.wikimediacloud.org - taavi@cumin1003"
  • 12:06 taavi@cumin1003: START - Cookbook sre.dns.netbox
  • 12:06 taavi@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:06 jmm@dns1004: END - running authdns-update
  • 12:05 jmm@dns1004: START - running authdns-update
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77506 and previous config saved to /var/cache/conftool/dbconfig/20250610-120412-root.json
  • 12:03 taavi@cumin1003: START - Cookbook sre.dns.netbox
  • 12:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T395241)', diff saved to https://phabricator.wikimedia.org/P77505 and previous config saved to /var/cache/conftool/dbconfig/20250610-120249-fceratto.json
  • 12:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P77504 and previous config saved to /var/cache/conftool/dbconfig/20250610-120103-marostegui.json
  • 11:59 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host install7002.wikimedia.org
  • 11:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T395241)', diff saved to https://phabricator.wikimedia.org/P77503 and previous config saved to /var/cache/conftool/dbconfig/20250610-115444-fceratto.json
  • 11:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 11:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T395241)', diff saved to https://phabricator.wikimedia.org/P77502 and previous config saved to /var/cache/conftool/dbconfig/20250610-115419-fceratto.json
  • 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77501 and previous config saved to /var/cache/conftool/dbconfig/20250610-114906-root.json
  • 11:48 moritzm: installing qemu bugfix updates
  • 11:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:47 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org
  • 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77500 and previous config saved to /var/cache/conftool/dbconfig/20250610-114617-root.json
  • 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T396130)', diff saved to https://phabricator.wikimedia.org/P77499 and previous config saved to /var/cache/conftool/dbconfig/20250610-114556-marostegui.json
  • 11:44 cgoubert@deploy1003: Finished scap sync-world: mediawiki-cli: Fix the paths of some of the dumps scripts and config files - T394389 (duration: 08m 49s)
  • 11:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P77497 and previous config saved to /var/cache/conftool/dbconfig/20250610-113913-fceratto.json
  • 11:37 moritzm: failover Ganeti master in codfw to ganeti2032
  • 11:35 cgoubert@deploy1003: Started scap sync-world: mediawiki-cli: Fix the paths of some of the dumps scripts and config files - T394389
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77495 and previous config saved to /var/cache/conftool/dbconfig/20250610-113401-root.json
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T396130)', diff saved to https://phabricator.wikimedia.org/P77494 and previous config saved to /var/cache/conftool/dbconfig/20250610-113328-marostegui.json
  • 11:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T396130)', diff saved to https://phabricator.wikimedia.org/P77493 and previous config saved to /var/cache/conftool/dbconfig/20250610-113306-marostegui.json
  • 11:31 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2044.codfw.wmnet
  • 11:31 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2044.codfw.wmnet
  • 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77492 and previous config saved to /var/cache/conftool/dbconfig/20250610-113112-root.json
  • 11:26 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2044.codfw.wmnet
  • 11:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P77491 and previous config saved to /var/cache/conftool/dbconfig/20250610-112406-fceratto.json
  • 11:21 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2044.codfw.wmnet
  • 11:21 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2043.codfw.wmnet
  • 11:20 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2043.codfw.wmnet
  • 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77490 and previous config saved to /var/cache/conftool/dbconfig/20250610-111856-root.json
  • 11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P77489 and previous config saved to /var/cache/conftool/dbconfig/20250610-111759-marostegui.json
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77488 and previous config saved to /var/cache/conftool/dbconfig/20250610-111606-root.json
  • 11:15 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2043.codfw.wmnet
  • 11:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1168 T395989', diff saved to https://phabricator.wikimedia.org/P77487 and previous config saved to /var/cache/conftool/dbconfig/20250610-111440-marostegui.json
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77486 and previous config saved to /var/cache/conftool/dbconfig/20250610-111054-root.json
  • 11:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T395241)', diff saved to https://phabricator.wikimedia.org/P77485 and previous config saved to /var/cache/conftool/dbconfig/20250610-110859-fceratto.json
  • 11:04 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:04 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P77484 and previous config saved to /var/cache/conftool/dbconfig/20250610-110252-marostegui.json
  • 11:01 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:01 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77483 and previous config saved to /var/cache/conftool/dbconfig/20250610-110101-root.json
  • 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T395241)', diff saved to https://phabricator.wikimedia.org/P77482 and previous config saved to /var/cache/conftool/dbconfig/20250610-105951-fceratto.json
  • 10:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T395241)', diff saved to https://phabricator.wikimedia.org/P77481 and previous config saved to /var/cache/conftool/dbconfig/20250610-105926-fceratto.json
  • 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77480 and previous config saved to /var/cache/conftool/dbconfig/20250610-105911-root.json
  • 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77479 and previous config saved to /var/cache/conftool/dbconfig/20250610-105548-root.json
  • 10:53 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2043.codfw.wmnet
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T396130)', diff saved to https://phabricator.wikimedia.org/P77478 and previous config saved to /var/cache/conftool/dbconfig/20250610-104745-marostegui.json
  • 10:47 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2042.codfw.wmnet
  • 10:47 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet
  • 10:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77477 and previous config saved to /var/cache/conftool/dbconfig/20250610-104556-root.json
  • 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77476 and previous config saved to /var/cache/conftool/dbconfig/20250610-104449-root.json
  • 10:44 taavi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'private.eqiad.wikimedia.cloud$' on eqiad recursors
  • 10:44 taavi@cumin1003: START - Cookbook sre.dns.wipe-cache 'private.eqiad.wikimedia.cloud$' on eqiad recursors
  • 10:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P77475 and previous config saved to /var/cache/conftool/dbconfig/20250610-104419-fceratto.json
  • 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77474 and previous config saved to /var/cache/conftool/dbconfig/20250610-104406-root.json
  • 10:43 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:42 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 10:42 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:42 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:42 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change AAAA records for eqiad cloudsw cloud-private GW IRB address - cmooney@cumin1003"
  • 10:42 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change AAAA records for eqiad cloudsw cloud-private GW IRB address - cmooney@cumin1003"
  • 10:42 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 10:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1180 T395989', diff saved to https://phabricator.wikimedia.org/P77473 and previous config saved to /var/cache/conftool/dbconfig/20250610-104143-marostegui.json
  • 10:41 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet
  • 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77472 and previous config saved to /var/cache/conftool/dbconfig/20250610-104043-root.json
  • 10:39 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:33 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2042.codfw.wmnet
  • 10:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T396130)', diff saved to https://phabricator.wikimedia.org/P77471 and previous config saved to /var/cache/conftool/dbconfig/20250610-103315-marostegui.json
  • 10:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T396130)', diff saved to https://phabricator.wikimedia.org/P77470 and previous config saved to /var/cache/conftool/dbconfig/20250610-103252-marostegui.json
  • 10:31 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2041.codfw.wmnet
  • 10:31 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
  • 10:31 taavi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'private.eqiad.wikimedia.cloud$' on eqiad recursors
  • 10:31 taavi@cumin1003: START - Cookbook sre.dns.wipe-cache 'private.eqiad.wikimedia.cloud$' on eqiad recursors
  • 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA records for WMCS cloud-private IPs in eqiad - cmooney@cumin1003"
  • 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA records for WMCS cloud-private IPs in eqiad - cmooney@cumin1003"
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77469 and previous config saved to /var/cache/conftool/dbconfig/20250610-102943-root.json
  • 10:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P77468 and previous config saved to /var/cache/conftool/dbconfig/20250610-102913-fceratto.json
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77467 and previous config saved to /var/cache/conftool/dbconfig/20250610-102900-root.json
  • 10:27 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:26 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
  • 10:25 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77466 and previous config saved to /var/cache/conftool/dbconfig/20250610-102538-root.json
  • 10:22 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:17 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2041.codfw.wmnet
  • 10:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P77465 and previous config saved to /var/cache/conftool/dbconfig/20250610-101745-marostegui.json
  • 10:17 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2040.codfw.wmnet
  • 10:17 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2040.codfw.wmnet
  • 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77464 and previous config saved to /var/cache/conftool/dbconfig/20250610-101438-root.json
  • 10:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T395241)', diff saved to https://phabricator.wikimedia.org/P77463 and previous config saved to /var/cache/conftool/dbconfig/20250610-101406-fceratto.json
  • 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77462 and previous config saved to /var/cache/conftool/dbconfig/20250610-101355-root.json
  • 10:12 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2040.codfw.wmnet
  • 10:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77461 and previous config saved to /var/cache/conftool/dbconfig/20250610-101032-root.json
  • 10:08 moritzm: installing jinja2 security updates
  • 10:08 moritzm: installing ninja2 security updates
  • 10:06 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2040.codfw.wmnet
  • 10:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T395241)', diff saved to https://phabricator.wikimedia.org/P77460 and previous config saved to /var/cache/conftool/dbconfig/20250610-100558-fceratto.json
  • 10:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T395241)', diff saved to https://phabricator.wikimedia.org/P77459 and previous config saved to /var/cache/conftool/dbconfig/20250610-100532-fceratto.json
  • 10:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P77458 and previous config saved to /var/cache/conftool/dbconfig/20250610-100239-marostegui.json
  • 10:01 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2039.codfw.wmnet
  • 10:00 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2039.codfw.wmnet
  • 09:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77457 and previous config saved to /var/cache/conftool/dbconfig/20250610-095933-root.json
  • 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77456 and previous config saved to /var/cache/conftool/dbconfig/20250610-095850-root.json
  • 09:58 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:55 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2039.codfw.wmnet
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77455 and previous config saved to /var/cache/conftool/dbconfig/20250610-095527-root.json
  • 09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2033.codfw.wmnet
  • 09:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 09:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P77454 and previous config saved to /var/cache/conftool/dbconfig/20250610-095025-fceratto.json
  • 09:50 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2039.codfw.wmnet
  • 09:49 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2038.codfw.wmnet
  • 09:49 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2038.codfw.wmnet
  • 09:48 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet
  • 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T396130)', diff saved to https://phabricator.wikimedia.org/P77453 and previous config saved to /var/cache/conftool/dbconfig/20250610-094731-marostegui.json
  • 09:46 moritzm: installing postgresql-15 security updates
  • 09:45 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2033 - Upgrading es2033.codfw.wmnet
  • 09:45 marostegui@cumin1002: START - Cookbook sre.mysql.depool es2033 - Upgrading es2033.codfw.wmnet
  • 09:44 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for es2033.codfw.wmnet
  • 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77450 and previous config saved to /var/cache/conftool/dbconfig/20250610-094429-root.json
  • 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2031 to es2 master T395241', diff saved to https://phabricator.wikimedia.org/P77449 and previous config saved to /var/cache/conftool/dbconfig/20250610-094401-root.json
  • 09:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2038.codfw.wmnet
  • 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77448 and previous config saved to /var/cache/conftool/dbconfig/20250610-094345-root.json
  • 09:43 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2032.codfw.wmnet
  • 09:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 09:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1187 T395989', diff saved to https://phabricator.wikimedia.org/P77447 and previous config saved to /var/cache/conftool/dbconfig/20250610-093846-marostegui.json
  • 09:38 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 09:37 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 09:37 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2032 - Upgrading es2032.codfw.wmnet
  • 09:37 marostegui@cumin1002: START - Cookbook sre.mysql.depool es2032 - Upgrading es2032.codfw.wmnet
  • 09:36 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for es2032.codfw.wmnet
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2030 to es1 master T395241', diff saved to https://phabricator.wikimedia.org/P77445 and previous config saved to /var/cache/conftool/dbconfig/20250610-093628-root.json
  • 09:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P77444 and previous config saved to /var/cache/conftool/dbconfig/20250610-093518-fceratto.json
  • 09:33 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2038.codfw.wmnet
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T396130)', diff saved to https://phabricator.wikimedia.org/P77443 and previous config saved to /var/cache/conftool/dbconfig/20250610-093252-marostegui.json
  • 09:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 09:31 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:27 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:26 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1019.eqiad.wmnet with reason: Upgrading clouddbs T394372
  • 09:26 jynus: upgrade db2197 to MariaDB 10.11 T394487
  • 09:24 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet
  • 09:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T395241)', diff saved to https://phabricator.wikimedia.org/P77442 and previous config saved to /var/cache/conftool/dbconfig/20250610-092011-fceratto.json
  • 09:17 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet,dbprov2003.codfw.wmnet with reason: Downtime hosts for MariaDB 10.11 upgrade
  • 09:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T395241)', diff saved to https://phabricator.wikimedia.org/P77441 and previous config saved to /var/cache/conftool/dbconfig/20250610-091040-fceratto.json
  • 09:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T395241)', diff saved to https://phabricator.wikimedia.org/P77440 and previous config saved to /var/cache/conftool/dbconfig/20250610-091016-fceratto.json
  • 09:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 09:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T396130)', diff saved to https://phabricator.wikimedia.org/P77439 and previous config saved to /var/cache/conftool/dbconfig/20250610-090635-marostegui.json
  • 08:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P77438 and previous config saved to /var/cache/conftool/dbconfig/20250610-085508-fceratto.json
  • 08:54 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:53 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:53 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P77437 and previous config saved to /var/cache/conftool/dbconfig/20250610-085128-marostegui.json
  • 08:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P77436 and previous config saved to /var/cache/conftool/dbconfig/20250610-084002-fceratto.json
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P77435 and previous config saved to /var/cache/conftool/dbconfig/20250610-083622-marostegui.json
  • 08:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T395241)', diff saved to https://phabricator.wikimedia.org/P77434 and previous config saved to /var/cache/conftool/dbconfig/20250610-082454-fceratto.json
  • 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T396130)', diff saved to https://phabricator.wikimedia.org/P77433 and previous config saved to /var/cache/conftool/dbconfig/20250610-082114-marostegui.json
  • 08:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T396130)', diff saved to https://phabricator.wikimedia.org/P77432 and previous config saved to /var/cache/conftool/dbconfig/20250610-081817-marostegui.json
  • 08:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T396130)', diff saved to https://phabricator.wikimedia.org/P77431 and previous config saved to /var/cache/conftool/dbconfig/20250610-081756-marostegui.json
  • 08:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T395241)', diff saved to https://phabricator.wikimedia.org/P77430 and previous config saved to /var/cache/conftool/dbconfig/20250610-081647-fceratto.json
  • 08:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32098
  • 08:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 32098
  • 08:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P77429 and previous config saved to /var/cache/conftool/dbconfig/20250610-080248-marostegui.json
  • 08:01 jynus: deploying grants for zuul backups @ m1 T394844
  • 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P77428 and previous config saved to /var/cache/conftool/dbconfig/20250610-074742-marostegui.json
  • 07:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77427 and previous config saved to /var/cache/conftool/dbconfig/20250610-073631-root.json
  • 07:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T396130)', diff saved to https://phabricator.wikimedia.org/P77426 and previous config saved to /var/cache/conftool/dbconfig/20250610-073234-marostegui.json
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T396130)', diff saved to https://phabricator.wikimedia.org/P77425 and previous config saved to /var/cache/conftool/dbconfig/20250610-073003-marostegui.json
  • 07:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T396130)', diff saved to https://phabricator.wikimedia.org/P77424 and previous config saved to /var/cache/conftool/dbconfig/20250610-072941-marostegui.json
  • 07:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8849
  • 07:28 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 8849
  • 07:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28173
  • 07:26 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28173
  • 07:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 60427
  • 07:25 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 60427
  • 07:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10310
  • 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77423 and previous config saved to /var/cache/conftool/dbconfig/20250610-072125-root.json
  • 07:21 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 10310
  • 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P77422 and previous config saved to /var/cache/conftool/dbconfig/20250610-071434-marostegui.json
  • 07:14 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install7002.wikimedia.org
  • 07:14 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install7002.wikimedia.org with OS bookworm
  • 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77421 and previous config saved to /var/cache/conftool/dbconfig/20250610-070620-root.json
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77420 and previous config saved to /var/cache/conftool/dbconfig/20250610-070240-root.json
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P77419 and previous config saved to /var/cache/conftool/dbconfig/20250610-065927-marostegui.json
  • 06:57 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install7002.wikimedia.org with reason: host reimage
  • 06:53 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on install7002.wikimedia.org with reason: host reimage
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77418 and previous config saved to /var/cache/conftool/dbconfig/20250610-065303-root.json
  • 06:52 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast7001.wikimedia.org
  • 06:52 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:52 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast7001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 06:52 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast7001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 06:51 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77417 and previous config saved to /var/cache/conftool/dbconfig/20250610-065114-root.json
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77416 and previous config saved to /var/cache/conftool/dbconfig/20250610-064735-root.json
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77415 and previous config saved to /var/cache/conftool/dbconfig/20250610-064615-root.json
  • 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T396130)', diff saved to https://phabricator.wikimedia.org/P77414 and previous config saved to /var/cache/conftool/dbconfig/20250610-064420-marostegui.json
  • 06:39 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77413 and previous config saved to /var/cache/conftool/dbconfig/20250610-063757-root.json
  • 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77412 and previous config saved to /var/cache/conftool/dbconfig/20250610-063608-root.json
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T396130)', diff saved to https://phabricator.wikimedia.org/P77411 and previous config saved to /var/cache/conftool/dbconfig/20250610-063547-marostegui.json
  • 06:35 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 06:35 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2031.codfw.wmnet
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T396130)', diff saved to https://phabricator.wikimedia.org/P77410 and previous config saved to /var/cache/conftool/dbconfig/20250610-063524-marostegui.json
  • 06:35 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts bast7001.wikimedia.org
  • 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77409 and previous config saved to /var/cache/conftool/dbconfig/20250610-063229-root.json
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77408 and previous config saved to /var/cache/conftool/dbconfig/20250610-063110-root.json
  • 06:28 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host install7002.wikimedia.org with OS bookworm
  • 06:25 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install7002.wikimedia.org - jmm@cumin1003"
  • 06:25 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install7002.wikimedia.org - jmm@cumin1003"
  • 06:25 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2031 - Upgrading es2031.codfw.wmnet
  • 06:25 marostegui@cumin1002: START - Cookbook sre.mysql.depool es2031 - Upgrading es2031.codfw.wmnet
  • 06:25 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install7002.wikimedia.org on all recursors
  • 06:25 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache install7002.wikimedia.org on all recursors
  • 06:25 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:25 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install7002.wikimedia.org - jmm@cumin1003"
  • 06:25 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install7002.wikimedia.org - jmm@cumin1003"
  • 06:25 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for es2031.codfw.wmnet
  • 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2031', diff saved to https://phabricator.wikimedia.org/P77407 and previous config saved to /var/cache/conftool/dbconfig/20250610-062501-marostegui.json
  • 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77406 and previous config saved to /var/cache/conftool/dbconfig/20250610-062252-root.json
  • 06:21 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 06:21 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host install7002.wikimedia.org
  • 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P77405 and previous config saved to /var/cache/conftool/dbconfig/20250610-062017-marostegui.json
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77404 and previous config saved to /var/cache/conftool/dbconfig/20250610-061724-root.json
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77403 and previous config saved to /var/cache/conftool/dbconfig/20250610-061604-root.json
  • 06:07 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77402 and previous config saved to /var/cache/conftool/dbconfig/20250610-060746-root.json
  • 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77401 and previous config saved to /var/cache/conftool/dbconfig/20250610-060638-root.json
  • 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P77400 and previous config saved to /var/cache/conftool/dbconfig/20250610-060510-marostegui.json
  • 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77399 and previous config saved to /var/cache/conftool/dbconfig/20250610-060218-root.json
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77398 and previous config saved to /var/cache/conftool/dbconfig/20250610-060059-root.json
  • 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77397 and previous config saved to /var/cache/conftool/dbconfig/20250610-055241-root.json
  • 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77396 and previous config saved to /var/cache/conftool/dbconfig/20250610-055132-root.json
  • 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T396130)', diff saved to https://phabricator.wikimedia.org/P77395 and previous config saved to /var/cache/conftool/dbconfig/20250610-055003-marostegui.json
  • 05:47 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77394 and previous config saved to /var/cache/conftool/dbconfig/20250610-054713-root.json
  • 05:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T396130)', diff saved to https://phabricator.wikimedia.org/P77393 and previous config saved to /var/cache/conftool/dbconfig/20250610-054705-marostegui.json
  • 05:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2030.codfw.wmnet
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T396130)', diff saved to https://phabricator.wikimedia.org/P77392 and previous config saved to /var/cache/conftool/dbconfig/20250610-054635-marostegui.json
  • 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77391 and previous config saved to /var/cache/conftool/dbconfig/20250610-054554-root.json
  • 05:39 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2030 - Upgrading es2030.codfw.wmnet
  • 05:39 marostegui@cumin1002: START - Cookbook sre.mysql.depool es2030 - Upgrading es2030.codfw.wmnet
  • 05:39 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for es2030.codfw.wmnet
  • 05:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2030.codfw.wmnet with reason: Maintenance
  • 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P77390 and previous config saved to /var/cache/conftool/dbconfig/20250610-053902-marostegui.json
  • 05:37 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77389 and previous config saved to /var/cache/conftool/dbconfig/20250610-053735-root.json
  • 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77388 and previous config saved to /var/cache/conftool/dbconfig/20250610-053627-root.json
  • 05:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2034.codfw.wmnet with reason: Maintenance
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P77387 and previous config saved to /var/cache/conftool/dbconfig/20250610-053128-marostegui.json
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2034', diff saved to https://phabricator.wikimedia.org/P77386 and previous config saved to /var/cache/conftool/dbconfig/20250610-053119-marostegui.json
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77385 and previous config saved to /var/cache/conftool/dbconfig/20250610-053048-root.json
  • 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1036', diff saved to https://phabricator.wikimedia.org/P77384 and previous config saved to /var/cache/conftool/dbconfig/20250610-052155-marostegui.json
  • 05:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1036.eqiad.wmnet with reason: Maintenance
  • 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77383 and previous config saved to /var/cache/conftool/dbconfig/20250610-052122-root.json
  • 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P77382 and previous config saved to /var/cache/conftool/dbconfig/20250610-051614-marostegui.json
  • 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77381 and previous config saved to /var/cache/conftool/dbconfig/20250610-050616-root.json
  • 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1231', diff saved to https://phabricator.wikimedia.org/P77380 and previous config saved to /var/cache/conftool/dbconfig/20250610-050215-marostegui.json
  • 05:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 05:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T396130)', diff saved to https://phabricator.wikimedia.org/P77379 and previous config saved to /var/cache/conftool/dbconfig/20250610-050107-marostegui.json
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1159 (T396130)', diff saved to https://phabricator.wikimedia.org/P77378 and previous config saved to /var/cache/conftool/dbconfig/20250610-045809-marostegui.json
  • 04:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 04:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.2 (duration: 04m 22s)

2025-06-09

  • 23:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T396130)', diff saved to https://phabricator.wikimedia.org/P77377 and previous config saved to /var/cache/conftool/dbconfig/20250609-235425-marostegui.json
  • 23:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P77376 and previous config saved to /var/cache/conftool/dbconfig/20250609-233918-marostegui.json
  • 23:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P77375 and previous config saved to /var/cache/conftool/dbconfig/20250609-232410-marostegui.json
  • 23:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T396130)', diff saved to https://phabricator.wikimedia.org/P77373 and previous config saved to /var/cache/conftool/dbconfig/20250609-230903-marostegui.json
  • 23:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2228 (T396130)', diff saved to https://phabricator.wikimedia.org/P77372 and previous config saved to /var/cache/conftool/dbconfig/20250609-230518-marostegui.json
  • 23:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2228.codfw.wmnet with reason: Maintenance
  • 23:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T396130)', diff saved to https://phabricator.wikimedia.org/P77371 and previous config saved to /var/cache/conftool/dbconfig/20250609-230454-marostegui.json
  • 22:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P77370 and previous config saved to /var/cache/conftool/dbconfig/20250609-224947-marostegui.json
  • 22:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 22:36 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cirrussearch2110.codfw.wmnet
  • 22:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P77369 and previous config saved to /var/cache/conftool/dbconfig/20250609-223439-marostegui.json
  • 22:29 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2110.codfw.wmnet
  • 22:19 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2111.codfw.wmnet
  • 22:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T396130)', diff saved to https://phabricator.wikimedia.org/P77368 and previous config saved to /var/cache/conftool/dbconfig/20250609-221932-marostegui.json
  • 22:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2223 (T396130)', diff saved to https://phabricator.wikimedia.org/P77367 and previous config saved to /var/cache/conftool/dbconfig/20250609-221524-marostegui.json
  • 22:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2223.codfw.wmnet with reason: Maintenance
  • 22:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T396130)', diff saved to https://phabricator.wikimedia.org/P77366 and previous config saved to /var/cache/conftool/dbconfig/20250609-221501-marostegui.json
  • 22:12 maryum: Deployed security fix for T395063
  • 22:12 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2111.codfw.wmnet
  • 22:08 maryum: Deployed security fix for T396230
  • 22:04 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2112.codfw.wmnet
  • 22:01 maryum: Deployed security fix for T395730
  • 21:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P77365 and previous config saved to /var/cache/conftool/dbconfig/20250609-215953-marostegui.json
  • 21:56 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2112.codfw.wmnet
  • 21:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2114.codfw.wmnet
  • 21:49 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:49 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:49 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2114.codfw.wmnet
  • 21:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2115.codfw.wmnet
  • 21:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P77364 and previous config saved to /var/cache/conftool/dbconfig/20250609-214446-marostegui.json
  • 21:41 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2115.codfw.wmnet
  • 21:36 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cirrussearch2115.codfw.wmnet
  • 21:36 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2115.codfw.wmnet
  • 21:35 ladsgroup@deploy1003: Finished scap sync-world: Backport for Restrict event page decoration to currently allowed namespaces (T392784) (duration: 11m 07s)
  • 21:33 vriley@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1186
  • 21:33 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
  • 21:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T396130)', diff saved to https://phabricator.wikimedia.org/P77363 and previous config saved to /var/cache/conftool/dbconfig/20250609-212939-marostegui.json
  • 21:29 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:29 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:28 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 21:28 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 21:27 vriley@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1186
  • 21:27 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:27 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
  • 21:26 ladsgroup@deploy1003: ladsgroup: Backport for Restrict event page decoration to currently allowed namespaces (T392784) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T396130)', diff saved to https://phabricator.wikimedia.org/P77362 and previous config saved to /var/cache/conftool/dbconfig/20250609-212531-marostegui.json
  • 21:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 21:24 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:24 ladsgroup@deploy1003: Started scap sync-world: Backport for Restrict event page decoration to currently allowed namespaces (T392784)
  • 21:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 21:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T396130)', diff saved to https://phabricator.wikimedia.org/P77361 and previous config saved to /var/cache/conftool/dbconfig/20250609-212253-marostegui.json
  • 21:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2114.codfw.wmnet
  • 21:19 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cirrussearch2113.codfw.wmnet
  • 21:19 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2113.codfw.wmnet
  • 21:18 eileen: config revision changed from 8acfbae4 to 37a2c896
  • 21:12 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2114.codfw.wmnet
  • 21:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2115.codfw.wmnet
  • 21:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P77360 and previous config saved to /var/cache/conftool/dbconfig/20250609-210746-marostegui.json
  • 21:03 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:03 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:01 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2115.codfw.wmnet
  • 20:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P77359 and previous config saved to /var/cache/conftool/dbconfig/20250609-205239-marostegui.json
  • 20:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T396130)', diff saved to https://phabricator.wikimedia.org/P77358 and previous config saved to /var/cache/conftool/dbconfig/20250609-203733-marostegui.json
  • 20:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T396130)', diff saved to https://phabricator.wikimedia.org/P77357 and previous config saved to /var/cache/conftool/dbconfig/20250609-203448-marostegui.json
  • 20:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 20:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T396130)', diff saved to https://phabricator.wikimedia.org/P77356 and previous config saved to /var/cache/conftool/dbconfig/20250609-203425-marostegui.json
  • 20:31 jsn@deploy1003: Finished scap sync-world: Backport for Deploy remaining Patroller Tools surveys (T396250) (duration: 13m 15s)
  • 20:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1252.eqiad.wmnet with reason: Maintenance
  • 20:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T395241)', diff saved to https://phabricator.wikimedia.org/P77355 and previous config saved to /var/cache/conftool/dbconfig/20250609-202723-fceratto.json
  • 20:24 jsn@deploy1003: jsn: Continuing with sync
  • 20:20 jsn@deploy1003: jsn: Backport for Deploy remaining Patroller Tools surveys (T396250) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P77354 and previous config saved to /var/cache/conftool/dbconfig/20250609-201918-marostegui.json
  • 20:18 jsn@deploy1003: Started scap sync-world: Backport for Deploy remaining Patroller Tools surveys (T396250)
  • 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for Disable VipsScaler in group1 (T290759) (duration: 10m 23s)
  • 20:14 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:13 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P77353 and previous config saved to /var/cache/conftool/dbconfig/20250609-201216-fceratto.json
  • 20:11 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:09 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:07 arlolra@deploy1003: arlolra: Continuing with sync
  • 20:05 arlolra@deploy1003: arlolra: Backport for Disable VipsScaler in group1 (T290759) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P77352 and previous config saved to /var/cache/conftool/dbconfig/20250609-200411-marostegui.json
  • 20:03 arlolra@deploy1003: Started scap sync-world: Backport for Disable VipsScaler in group1 (T290759)
  • 20:02 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:01 bd808@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 19:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P77351 and previous config saved to /var/cache/conftool/dbconfig/20250609-195709-fceratto.json
  • 19:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T396130)', diff saved to https://phabricator.wikimedia.org/P77350 and previous config saved to /var/cache/conftool/dbconfig/20250609-194904-marostegui.json
  • 19:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T396130)', diff saved to https://phabricator.wikimedia.org/P77349 and previous config saved to /var/cache/conftool/dbconfig/20250609-194520-marostegui.json
  • 19:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T396130)', diff saved to https://phabricator.wikimedia.org/P77348 and previous config saved to /var/cache/conftool/dbconfig/20250609-194456-marostegui.json
  • 19:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T395241)', diff saved to https://phabricator.wikimedia.org/P77347 and previous config saved to /var/cache/conftool/dbconfig/20250609-194203-fceratto.json
  • 19:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T395241)', diff saved to https://phabricator.wikimedia.org/P77346 and previous config saved to /var/cache/conftool/dbconfig/20250609-193354-fceratto.json
  • 19:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 19:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T395241)', diff saved to https://phabricator.wikimedia.org/P77345 and previous config saved to /var/cache/conftool/dbconfig/20250609-193329-fceratto.json
  • 19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P77344 and previous config saved to /var/cache/conftool/dbconfig/20250609-192949-marostegui.json
  • 19:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P77343 and previous config saved to /var/cache/conftool/dbconfig/20250609-191823-fceratto.json
  • 19:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P77342 and previous config saved to /var/cache/conftool/dbconfig/20250609-191442-marostegui.json
  • 19:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P77341 and previous config saved to /var/cache/conftool/dbconfig/20250609-190316-fceratto.json
  • 18:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T396130)', diff saved to https://phabricator.wikimedia.org/P77340 and previous config saved to /var/cache/conftool/dbconfig/20250609-185935-marostegui.json
  • 18:59 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2004-dev.codfw.wmnet
  • 18:59 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:59 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T396130)', diff saved to https://phabricator.wikimedia.org/P77339 and previous config saved to /var/cache/conftool/dbconfig/20250609-185525-marostegui.json
  • 18:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T396130)', diff saved to https://phabricator.wikimedia.org/P77338 and previous config saved to /var/cache/conftool/dbconfig/20250609-185502-marostegui.json
  • 18:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T395241)', diff saved to https://phabricator.wikimedia.org/P77337 and previous config saved to /var/cache/conftool/dbconfig/20250609-184809-fceratto.json
  • 18:47 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 18:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P77336 and previous config saved to /var/cache/conftool/dbconfig/20250609-183955-marostegui.json
  • 18:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T395241)', diff saved to https://phabricator.wikimedia.org/P77335 and previous config saved to /var/cache/conftool/dbconfig/20250609-183915-fceratto.json
  • 18:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 18:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T395241)', diff saved to https://phabricator.wikimedia.org/P77334 and previous config saved to /var/cache/conftool/dbconfig/20250609-183850-fceratto.json
  • 18:37 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 18:31 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2004-dev.codfw.wmnet
  • 18:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P77333 and previous config saved to /var/cache/conftool/dbconfig/20250609-182448-marostegui.json
  • 18:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P77332 and previous config saved to /var/cache/conftool/dbconfig/20250609-182343-fceratto.json
  • 18:22 hmonroy@deploy1003: Finished scap sync-world: Backport for Enable Codex and Multiblocks by default (T377121) (duration: 16m 57s)
  • 18:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:15 hmonroy@deploy1003: hmonroy: Continuing with sync
  • 18:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T396130)', diff saved to https://phabricator.wikimedia.org/P77331 and previous config saved to /var/cache/conftool/dbconfig/20250609-180941-marostegui.json
  • 18:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:09 hmonroy@deploy1003: hmonroy: Backport for Enable Codex and Multiblocks by default (T377121) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P77330 and previous config saved to /var/cache/conftool/dbconfig/20250609-180836-fceratto.json
  • 18:05 hmonroy@deploy1003: Started scap sync-world: Backport for Enable Codex and Multiblocks by default (T377121)
  • 18:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T396130)', diff saved to https://phabricator.wikimedia.org/P77329 and previous config saved to /var/cache/conftool/dbconfig/20250609-180530-marostegui.json
  • 18:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 18:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 18:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2044
  • 17:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2044
  • 17:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 17:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T396130)', diff saved to https://phabricator.wikimedia.org/P77328 and previous config saved to /var/cache/conftool/dbconfig/20250609-175747-marostegui.json
  • 17:55 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T395241)', diff saved to https://phabricator.wikimedia.org/P77327 and previous config saved to /var/cache/conftool/dbconfig/20250609-175330-fceratto.json
  • 17:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 17:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2044 to codfw - jhancock@cumin2002"
  • 17:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2044 to codfw - jhancock@cumin2002"
  • 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Maintenance
  • 17:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 17:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T395241)', diff saved to https://phabricator.wikimedia.org/P77326 and previous config saved to /var/cache/conftool/dbconfig/20250609-174523-fceratto.json
  • 17:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 17:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T395241)', diff saved to https://phabricator.wikimedia.org/P77325 and previous config saved to /var/cache/conftool/dbconfig/20250609-174457-fceratto.json
  • 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P77324 and previous config saved to /var/cache/conftool/dbconfig/20250609-174240-marostegui.json
  • 17:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P77323 and previous config saved to /var/cache/conftool/dbconfig/20250609-172950-fceratto.json
  • 17:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P77322 and previous config saved to /var/cache/conftool/dbconfig/20250609-172733-marostegui.json
  • 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 17:21 inflatador: bking@cumin1003 power down cirrussearch1063 to prevent logspam T394350
  • 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 17:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 17:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P77320 and previous config saved to /var/cache/conftool/dbconfig/20250609-171443-fceratto.json
  • 17:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T396130)', diff saved to https://phabricator.wikimedia.org/P77319 and previous config saved to /var/cache/conftool/dbconfig/20250609-171225-marostegui.json
  • 17:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:10 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 17:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T396130)', diff saved to https://phabricator.wikimedia.org/P77318 and previous config saved to /var/cache/conftool/dbconfig/20250609-170939-marostegui.json
  • 17:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 17:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 17:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 17:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T396130)', diff saved to https://phabricator.wikimedia.org/P77317 and previous config saved to /var/cache/conftool/dbconfig/20250609-170447-marostegui.json
  • 17:04 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 17:04 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 17:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2043
  • 16:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2043
  • 16:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2043 to codfw - jhancock@cumin2002"
  • 16:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T395241)', diff saved to https://phabricator.wikimedia.org/P77316 and previous config saved to /var/cache/conftool/dbconfig/20250609-165936-fceratto.json
  • 16:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2043 to codfw - jhancock@cumin2002"
  • 16:56 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T395241)', diff saved to https://phabricator.wikimedia.org/P77315 and previous config saved to /var/cache/conftool/dbconfig/20250609-165125-fceratto.json
  • 16:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 16:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T395241)', diff saved to https://phabricator.wikimedia.org/P77314 and previous config saved to /var/cache/conftool/dbconfig/20250609-165100-fceratto.json
  • 16:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P77313 and previous config saved to /var/cache/conftool/dbconfig/20250609-164940-marostegui.json
  • 16:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2058
  • 16:38 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2058
  • 16:37 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2058 to codfw - jhancock@cumin2002"
  • 16:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2058 to codfw - jhancock@cumin2002"
  • 16:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P77312 and previous config saved to /var/cache/conftool/dbconfig/20250609-163553-fceratto.json
  • 16:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P77311 and previous config saved to /var/cache/conftool/dbconfig/20250609-163433-marostegui.json
  • 16:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:30 dancy@deploy1003: Finished scap sync-world: Testing T395514 (duration: 34m 14s)
  • 16:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P77310 and previous config saved to /var/cache/conftool/dbconfig/20250609-162046-fceratto.json
  • 16:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T396130)', diff saved to https://phabricator.wikimedia.org/P77309 and previous config saved to /var/cache/conftool/dbconfig/20250609-161926-marostegui.json
  • 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T396130)', diff saved to https://phabricator.wikimedia.org/P77308 and previous config saved to /var/cache/conftool/dbconfig/20250609-161640-marostegui.json
  • 16:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T396130)', diff saved to https://phabricator.wikimedia.org/P77307 and previous config saved to /var/cache/conftool/dbconfig/20250609-161618-marostegui.json
  • 16:12 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:12 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T395241)', diff saved to https://phabricator.wikimedia.org/P77306 and previous config saved to /var/cache/conftool/dbconfig/20250609-160539-fceratto.json
  • 16:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P77305 and previous config saved to /var/cache/conftool/dbconfig/20250609-160111-marostegui.json
  • 15:57 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T395241)', diff saved to https://phabricator.wikimedia.org/P77304 and previous config saved to /var/cache/conftool/dbconfig/20250609-155730-fceratto.json
  • 15:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 15:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T395241)', diff saved to https://phabricator.wikimedia.org/P77303 and previous config saved to /var/cache/conftool/dbconfig/20250609-155705-fceratto.json
  • 15:55 dancy@deploy1003: Started scap sync-world: Testing T395514
  • 15:52 dancy@deploy1003: Installation of scap version "4.172.0" completed for 182 hosts
  • 15:46 dancy@deploy1003: Installing scap version "4.172.0" for 182 host(s)
  • 15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P77302 and previous config saved to /var/cache/conftool/dbconfig/20250609-154604-marostegui.json
  • 15:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P77301 and previous config saved to /var/cache/conftool/dbconfig/20250609-154158-fceratto.json
  • 15:33 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:33 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T396130)', diff saved to https://phabricator.wikimedia.org/P77300 and previous config saved to /var/cache/conftool/dbconfig/20250609-153057-marostegui.json
  • 15:28 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 15:28 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 15:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T396130)', diff saved to https://phabricator.wikimedia.org/P77299 and previous config saved to /var/cache/conftool/dbconfig/20250609-152810-marostegui.json
  • 15:28 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
  • 15:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T396130)', diff saved to https://phabricator.wikimedia.org/P77298 and previous config saved to /var/cache/conftool/dbconfig/20250609-152749-marostegui.json
  • 15:27 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
  • 15:27 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 15:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P77297 and previous config saved to /var/cache/conftool/dbconfig/20250609-152651-fceratto.json
  • 15:26 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 15:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 15:25 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 15:25 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 15:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 15:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 15:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 15:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 15:23 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 15:23 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 15:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 15:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:21 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:16 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:16 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P77296 and previous config saved to /var/cache/conftool/dbconfig/20250609-151242-marostegui.json
  • 15:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T395241)', diff saved to https://phabricator.wikimedia.org/P77295 and previous config saved to /var/cache/conftool/dbconfig/20250609-151144-fceratto.json
  • 15:01 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T395241)', diff saved to https://phabricator.wikimedia.org/P77294 and previous config saved to /var/cache/conftool/dbconfig/20250609-150134-fceratto.json
  • 15:01 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 15:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T395241)', diff saved to https://phabricator.wikimedia.org/P77293 and previous config saved to /var/cache/conftool/dbconfig/20250609-150108-fceratto.json
  • 14:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P77292 and previous config saved to /var/cache/conftool/dbconfig/20250609-145735-marostegui.json
  • 14:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P77291 and previous config saved to /var/cache/conftool/dbconfig/20250609-144601-fceratto.json
  • 14:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T396130)', diff saved to https://phabricator.wikimedia.org/P77290 and previous config saved to /var/cache/conftool/dbconfig/20250609-144230-marostegui.json
  • 14:40 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus7001.magru.wmnet
  • 14:40 tappof@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:40 tappof@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - tappof@cumin1002"
  • 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T396130)', diff saved to https://phabricator.wikimedia.org/P77289 and previous config saved to /var/cache/conftool/dbconfig/20250609-143938-marostegui.json
  • 14:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 14:39 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - tappof@cumin1002"
  • 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T396130)', diff saved to https://phabricator.wikimedia.org/P77288 and previous config saved to /var/cache/conftool/dbconfig/20250609-143917-marostegui.json
  • 14:36 tappof@cumin1002: START - Cookbook sre.dns.netbox
  • 14:31 tappof@cumin1002: START - Cookbook sre.hosts.decommission for hosts prometheus7001.magru.wmnet
  • 14:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P77287 and previous config saved to /var/cache/conftool/dbconfig/20250609-143054-fceratto.json
  • 14:30 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet
  • 14:24 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:24 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P77286 and previous config saved to /var/cache/conftool/dbconfig/20250609-142410-marostegui.json
  • 14:18 godog: rollout cgroup memory limit + gomemlimit for thanos-sidecar - T394318
  • 14:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T395241)', diff saved to https://phabricator.wikimedia.org/P77285 and previous config saved to /var/cache/conftool/dbconfig/20250609-141548-fceratto.json
  • 14:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P77284 and previous config saved to /var/cache/conftool/dbconfig/20250609-140903-marostegui.json
  • 14:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T395241)', diff saved to https://phabricator.wikimedia.org/P77283 and previous config saved to /var/cache/conftool/dbconfig/20250609-140722-fceratto.json
  • 14:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 14:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T395241)', diff saved to https://phabricator.wikimedia.org/P77282 and previous config saved to /var/cache/conftool/dbconfig/20250609-140656-fceratto.json
  • 13:55 sukhe@dns1004: END - running authdns-update
  • 13:55 sukhe@dns1004: START - running authdns-update
  • 13:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T396130)', diff saved to https://phabricator.wikimedia.org/P77281 and previous config saved to /var/cache/conftool/dbconfig/20250609-135355-marostegui.json
  • 13:52 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1015.eqiad.wmnet with reason: Upgrading clouddbs T394372
  • 13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P77280 and previous config saved to /var/cache/conftool/dbconfig/20250609-135150-fceratto.json
  • 13:51 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet
  • 13:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T396130)', diff saved to https://phabricator.wikimedia.org/P77279 and previous config saved to /var/cache/conftool/dbconfig/20250609-135105-marostegui.json
  • 13:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T396130)', diff saved to https://phabricator.wikimedia.org/P77278 and previous config saved to /var/cache/conftool/dbconfig/20250609-135043-marostegui.json
  • 13:45 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns5004*} and (A:dnsbox)
  • 13:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5004.wikimedia.org
  • 13:45 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns6002*} and (A:dnsbox)
  • 13:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6002.wikimedia.org
  • 13:42 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:42 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P77277 and previous config saved to /var/cache/conftool/dbconfig/20250609-133643-fceratto.json
  • 13:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6002.wikimedia.org
  • 13:35 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns6002*} and (A:dnsbox)
  • 13:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5004.wikimedia.org
  • 13:35 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns5004*} and (A:dnsbox)
  • 13:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P77276 and previous config saved to /var/cache/conftool/dbconfig/20250609-133535-marostegui.json
  • 13:35 sukhe@dns1004: END - running authdns-update
  • 13:34 sukhe@dns1004: START - running authdns-update
  • 13:34 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns3004*} and (A:dnsbox)
  • 13:34 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3004.wikimedia.org
  • 13:33 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns4004*} and (A:dnsbox)
  • 13:33 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4004.wikimedia.org
  • 13:31 taavi@deploy1003: Finished scap sync-world: Backport for logging: Allow sampling of Logstash logs (T395967), logging: Sample some high-volume log streams (T394402) (duration: 24m 30s)
  • 13:30 vgutierrez@dns1004: END - running authdns-update
  • 13:30 vgutierrez@dns1004: START - running authdns-update
  • 13:22 taavi@deploy1003: taavi, tgr: Continuing with sync
  • 13:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T395241)', diff saved to https://phabricator.wikimedia.org/P77275 and previous config saved to /var/cache/conftool/dbconfig/20250609-132136-fceratto.json
  • 13:21 taavi@deploy1003: taavi, tgr: Backport for logging: Allow sampling of Logstash logs (T395967), logging: Sample some high-volume log streams (T394402) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P77274 and previous config saved to /var/cache/conftool/dbconfig/20250609-132028-marostegui.json
  • 13:19 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4004.wikimedia.org
  • 13:19 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns4004*} and (A:dnsbox)
  • 13:19 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3004.wikimedia.org
  • 13:19 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns3004*} and (A:dnsbox)
  • 13:17 taavi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:17 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add AAAA record for openstack.codfw1dev.wikimediacloud.org - taavi@cumin1002"
  • 13:17 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add AAAA record for openstack.codfw1dev.wikimediacloud.org - taavi@cumin1002"
  • 13:17 taavi@dns1004: END - running authdns-update
  • 13:16 taavi@dns1004: START - running authdns-update
  • 13:13 sukhe@dns1004: FAIL - running authdns-update
  • 13:12 sukhe@dns1004: START - running authdns-update
  • 13:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T395241)', diff saved to https://phabricator.wikimedia.org/P77273 and previous config saved to /var/cache/conftool/dbconfig/20250609-131238-fceratto.json
  • 13:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 13:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T395241)', diff saved to https://phabricator.wikimedia.org/P77272 and previous config saved to /var/cache/conftool/dbconfig/20250609-131206-fceratto.json
  • 13:11 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns1004*} and (A:dnsbox)
  • 13:11 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1004.wikimedia.org
  • 13:11 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns2005*} and (A:dnsbox)
  • 13:11 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2005.wikimedia.org
  • 13:10 taavi@cumin1002: START - Cookbook sre.dns.netbox
  • 13:07 taavi@deploy1003: Started scap sync-world: Backport for logging: Allow sampling of Logstash logs (T395967), logging: Sample some high-volume log streams (T394402)
  • 13:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T396130)', diff saved to https://phabricator.wikimedia.org/P77271 and previous config saved to /var/cache/conftool/dbconfig/20250609-130521-marostegui.json
  • 13:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T396130)', diff saved to https://phabricator.wikimedia.org/P77270 and previous config saved to /var/cache/conftool/dbconfig/20250609-130230-marostegui.json
  • 13:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:58 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2005.wikimedia.org
  • 12:58 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns2005*} and (A:dnsbox)
  • 12:58 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1004.wikimedia.org
  • 12:58 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns1004*} and (A:dnsbox)
  • 12:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P77269 and previous config saved to /var/cache/conftool/dbconfig/20250609-125659-fceratto.json
  • 12:55 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T396130)', diff saved to https://phabricator.wikimedia.org/P77268 and previous config saved to /var/cache/conftool/dbconfig/20250609-124534-marostegui.json
  • 12:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P77267 and previous config saved to /var/cache/conftool/dbconfig/20250609-124152-fceratto.json
  • 12:41 jgleeson: SmashPig upgraded from 3222a1f3 to 042d5a5b
  • 12:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P77266 and previous config saved to /var/cache/conftool/dbconfig/20250609-123027-marostegui.json
  • 12:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T395241)', diff saved to https://phabricator.wikimedia.org/P77265 and previous config saved to /var/cache/conftool/dbconfig/20250609-122644-fceratto.json
  • 12:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:23 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:23 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new private IP for cloudcontrol2010-dev - andrew@cumin1002"
  • 12:23 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new private IP for cloudcontrol2010-dev - andrew@cumin1002"
  • 12:20 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:19 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 12:17 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:17 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new private IP for cloudcontrol2010-dev - andrew@cumin1002"
  • 12:17 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new private IP for cloudcontrol2010-dev - andrew@cumin1002"
  • 12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T395241)', diff saved to https://phabricator.wikimedia.org/P77264 and previous config saved to /var/cache/conftool/dbconfig/20250609-121700-fceratto.json
  • 12:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 12:16 godog: bounce thanos-store on titan1*
  • 12:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T395241)', diff saved to https://phabricator.wikimedia.org/P77263 and previous config saved to /var/cache/conftool/dbconfig/20250609-121636-fceratto.json
  • 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P77262 and previous config saved to /var/cache/conftool/dbconfig/20250609-121520-marostegui.json
  • 12:13 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:13 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 12:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P77261 and previous config saved to /var/cache/conftool/dbconfig/20250609-120129-fceratto.json
  • 12:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T396130)', diff saved to https://phabricator.wikimedia.org/P77260 and previous config saved to /var/cache/conftool/dbconfig/20250609-120013-marostegui.json
  • 11:58 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T396130)', diff saved to https://phabricator.wikimedia.org/P77258 and previous config saved to /var/cache/conftool/dbconfig/20250609-115350-marostegui.json
  • 11:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T396130)', diff saved to https://phabricator.wikimedia.org/P77257 and previous config saved to /var/cache/conftool/dbconfig/20250609-115328-marostegui.json
  • 11:53 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 11:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P77256 and previous config saved to /var/cache/conftool/dbconfig/20250609-114622-fceratto.json
  • 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P77255 and previous config saved to /var/cache/conftool/dbconfig/20250609-113821-marostegui.json
  • 11:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T395241)', diff saved to https://phabricator.wikimedia.org/P77254 and previous config saved to /var/cache/conftool/dbconfig/20250609-113113-fceratto.json
  • 11:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P77253 and previous config saved to /var/cache/conftool/dbconfig/20250609-112314-marostegui.json
  • 11:19 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T395241)', diff saved to https://phabricator.wikimedia.org/P77252 and previous config saved to /var/cache/conftool/dbconfig/20250609-111951-fceratto.json
  • 11:19 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2151* gradually with 4 steps - Pooling in
  • 11:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 11:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T395241)', diff saved to https://phabricator.wikimedia.org/P77250 and previous config saved to /var/cache/conftool/dbconfig/20250609-111926-fceratto.json
  • 11:18 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2158* gradually with 4 steps - Pooling in
  • 11:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T396130)', diff saved to https://phabricator.wikimedia.org/P77247 and previous config saved to /var/cache/conftool/dbconfig/20250609-110807-marostegui.json
  • 11:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P77245 and previous config saved to /var/cache/conftool/dbconfig/20250609-110418-fceratto.json
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T396130)', diff saved to https://phabricator.wikimedia.org/P77243 and previous config saved to /var/cache/conftool/dbconfig/20250609-110140-marostegui.json
  • 11:01 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T396130)', diff saved to https://phabricator.wikimedia.org/P77242 and previous config saved to /var/cache/conftool/dbconfig/20250609-110118-marostegui.json
  • 10:54 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:54 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P77240 and previous config saved to /var/cache/conftool/dbconfig/20250609-104911-fceratto.json
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P77238 and previous config saved to /var/cache/conftool/dbconfig/20250609-104611-marostegui.json
  • 10:34 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2151* gradually with 4 steps - Pooling in
  • 10:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T395241)', diff saved to https://phabricator.wikimedia.org/P77236 and previous config saved to /var/cache/conftool/dbconfig/20250609-103404-fceratto.json
  • 10:33 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:33 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:33 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2158* gradually with 4 steps - Pooling in
  • 10:31 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2151.codfw.wmnet
  • 10:31 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for db2151.codfw.wmnet
  • 10:31 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2158.codfw.wmnet
  • 10:31 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for db2158.codfw.wmnet
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P77234 and previous config saved to /var/cache/conftool/dbconfig/20250609-103104-marostegui.json
  • 10:30 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 10:29 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 10:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T395241)', diff saved to https://phabricator.wikimedia.org/P77233 and previous config saved to /var/cache/conftool/dbconfig/20250609-102214-fceratto.json
  • 10:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 10:16 vgutierrez: repooling lvs1013 handling ncredir@eqiad using katran based load balancing - T395228
  • 10:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T396130)', diff saved to https://phabricator.wikimedia.org/P77232 and previous config saved to /var/cache/conftool/dbconfig/20250609-101557-marostegui.json
  • 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2214', diff saved to https://phabricator.wikimedia.org/P77231 and previous config saved to /var/cache/conftool/dbconfig/20250609-101400-marostegui.json
  • 10:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77230 and previous config saved to /var/cache/conftool/dbconfig/20250609-101258-root.json
  • 10:12 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2158.codfw.wmnet onto db2151.codfw.wmnet
  • 10:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2214 T395989', diff saved to https://phabricator.wikimedia.org/P77229 and previous config saved to /var/cache/conftool/dbconfig/20250609-100605-marostegui.json
  • 10:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 10:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T396130)', diff saved to https://phabricator.wikimedia.org/P77228 and previous config saved to /var/cache/conftool/dbconfig/20250609-100337-marostegui.json
  • 09:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P77227 and previous config saved to /var/cache/conftool/dbconfig/20250609-094830-marostegui.json
  • 09:42 marostegui: Migrate s2 eqiad dbmaint to SBR T383795
  • 09:35 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 09:34 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 09:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P77226 and previous config saved to /var/cache/conftool/dbconfig/20250609-093323-marostegui.json
  • 09:31 vgutierrez: depooling lvs1013 before switching ncredir@eqiad to katran based load balancing - T395228
  • 09:28 tappof@dns1004: END - running authdns-update
  • 09:27 tappof@dns1004: START - running authdns-update
  • 09:20 vgutierrez: upload liberica 0.17 to apt.wm.o (bookworm-wikimedia) - T395228
  • 09:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T396130)', diff saved to https://phabricator.wikimedia.org/P77225 and previous config saved to /var/cache/conftool/dbconfig/20250609-091816-marostegui.json
  • 09:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T396130)', diff saved to https://phabricator.wikimedia.org/P77224 and previous config saved to /var/cache/conftool/dbconfig/20250609-091528-marostegui.json
  • 09:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 09:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T396130)', diff saved to https://phabricator.wikimedia.org/P77223 and previous config saved to /var/cache/conftool/dbconfig/20250609-091506-marostegui.json
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P77222 and previous config saved to /var/cache/conftool/dbconfig/20250609-085959-marostegui.json
  • 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P77221 and previous config saved to /var/cache/conftool/dbconfig/20250609-084452-marostegui.json
  • 08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T396130)', diff saved to https://phabricator.wikimedia.org/P77220 and previous config saved to /var/cache/conftool/dbconfig/20250609-082945-marostegui.json
  • 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T396130)', diff saved to https://phabricator.wikimedia.org/P77219 and previous config saved to /var/cache/conftool/dbconfig/20250609-082655-marostegui.json
  • 08:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T396130)', diff saved to https://phabricator.wikimedia.org/P77218 and previous config saved to /var/cache/conftool/dbconfig/20250609-082633-marostegui.json
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P77217 and previous config saved to /var/cache/conftool/dbconfig/20250609-081126-marostegui.json
  • 08:08 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2243.codfw.wmnet onto db2244.codfw.wmnet
  • 08:08 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2244 gradually with 4 steps - Pool db2244.codfw.wmnet in after cloning
  • 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P77215 and previous config saved to /var/cache/conftool/dbconfig/20250609-075619-marostegui.json
  • 07:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T396130)', diff saved to https://phabricator.wikimedia.org/P77213 and previous config saved to /var/cache/conftool/dbconfig/20250609-074112-marostegui.json
  • 07:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T396130)', diff saved to https://phabricator.wikimedia.org/P77211 and previous config saved to /var/cache/conftool/dbconfig/20250609-073403-marostegui.json
  • 07:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 07:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 07:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 07:23 marostegui@cumin1002: START - Cookbook sre.mysql.pool db2244 gradually with 4 steps - Pool db2244.codfw.wmnet in after cloning
  • 06:22 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2243 gradually with 4 steps - Pool db2243.codfw.wmnet in after cloning
  • 05:42 marostegui: Add MariaDB 10.11.13 to the repo T395663
  • 05:37 marostegui@cumin1002: START - Cookbook sre.mysql.pool db2243 gradually with 4 steps - Pool db2243.codfw.wmnet in after cloning
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2244 to dbctl depooled T393989', diff saved to https://phabricator.wikimedia.org/P77205 and previous config saved to /var/cache/conftool/dbconfig/20250609-052451-marostegui.json
  • 05:00 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2243 - Depool db2243.codfw.wmnet to then clone it to db2244.codfw.wmnet - marostegui@cumin1002
  • 05:00 marostegui@cumin1002: START - Cookbook sre.mysql.depool db2243 - Depool db2243.codfw.wmnet to then clone it to db2244.codfw.wmnet - marostegui@cumin1002
  • 05:00 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2243.codfw.wmnet onto db2244.codfw.wmnet

2025-06-08

  • 12:04 Ammar: Ran fixStuckGlobalRename.php for T396290 and T396291
  • 09:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250526/ using stat1011.eqiad.wmnet)

2025-06-07

  • 11:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2007.codfw.wmnet with OS bullseye
  • 11:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2006.codfw.wmnet with OS bullseye
  • 08:12 elukey: restart apache2 / php-fpm on phab1004
  • 04:18 mutante: restarted apache on phab1004

2025-06-06

  • 21:33 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:25 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:19 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:02 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: downtime before decom
  • 20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2007.codfw.wmnet with reason: host reimage
  • 20:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 20:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2007.codfw.wmnet with reason: host reimage
  • 20:35 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 20:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 20:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2007.codfw.wmnet with OS bullseye
  • 20:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2006.codfw.wmnet with reason: host reimage
  • 20:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2006.codfw.wmnet with reason: host reimage
  • 19:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2006.codfw.wmnet with OS bullseye
  • 19:11 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
  • 19:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2244.codfw.wmnet with OS bookworm
  • 19:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2244.codfw.wmnet with reason: host reimage
  • 17:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2244.codfw.wmnet with reason: host reimage
  • 17:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2244.codfw.wmnet with OS bookworm
  • 17:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2244']
  • 17:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2244']
  • 17:20 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
  • 17:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:08 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
  • 17:06 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
  • 17:00 sukhe: forced agent run on O:alerting_host to reload vopsbot to add cdobbins
  • 16:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2244
  • 16:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2244
  • 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:42 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:08 sbassett: Deployed security update to fix T396111
  • 15:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 15:34 eevans@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
  • 15:24 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 15:23 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 15:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1013.eqiad.wmnet
  • 15:19 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1013.eqiad.wmnet
  • 14:53 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:42 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:24 sukhe@dns1004: END - running authdns-update
  • 14:23 sukhe@dns1004: START - running authdns-update
  • 14:23 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns2004*} and (A:dnsbox)
  • 14:23 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2004.wikimedia.org
  • 14:22 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns1005*} and (A:dnsbox)
  • 14:22 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1005.wikimedia.org
  • 14:10 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1005.wikimedia.org
  • 14:10 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns1005*} and (A:dnsbox)
  • 14:10 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2004.wikimedia.org
  • 14:10 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns2004*} and (A:dnsbox)
  • 13:51 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns2006*} and (A:dnsbox)
  • 13:51 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2006.wikimedia.org
  • 13:49 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns1006*} and (A:dnsbox)
  • 13:49 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1006.wikimedia.org
  • 13:40 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 13:40 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 13:40 vgutierrez@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: switching to katran
  • 13:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2006.wikimedia.org
  • 13:35 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns2006*} and (A:dnsbox)
  • 13:34 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1006.wikimedia.org
  • 13:34 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns1006*} and (A:dnsbox)
  • 13:32 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns3003*} and (A:dnsbox)
  • 13:32 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3003.wikimedia.org
  • 13:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns6001*} and (A:dnsbox)
  • 13:31 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6001.wikimedia.org
  • 13:21 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:21 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add back entry for mistakenly deleted ssw1-a8-codfw IP - cmooney@cumin1003"
  • 13:21 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add back entry for mistakenly deleted ssw1-a8-codfw IP - cmooney@cumin1003"
  • 13:18 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6001.wikimedia.org
  • 13:18 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns6001*} and (A:dnsbox)
  • 13:18 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3003.wikimedia.org
  • 13:18 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns3003*} and (A:dnsbox)
  • 13:13 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns5003*} and (A:dnsbox)
  • 13:13 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5003.wikimedia.org
  • 13:09 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns4003*} and (A:dnsbox)
  • 13:09 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4003.wikimedia.org
  • 13:06 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 12:58 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5003.wikimedia.org
  • 12:58 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns5003*} and (A:dnsbox)
  • 12:58 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4003.wikimedia.org
  • 12:58 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns4003*} and (A:dnsbox)
  • 12:20 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2158 - Depool db2158.codfw.wmnet to then clone it to db2151.codfw.wmnet - fceratto@cumin1002
  • 12:20 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2158 - Depool db2158.codfw.wmnet to then clone it to db2151.codfw.wmnet - fceratto@cumin1002
  • 12:19 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db2158.codfw.wmnet onto db2151.codfw.wmnet
  • 12:03 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 12:03 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Disabling notifications
  • 11:42 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:42 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:32 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:32 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:32 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:31 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:31 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:30 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 11:30 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 11:28 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 11:28 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:28 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:28 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:27 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:27 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 11:26 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
  • 11:25 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
  • 11:22 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:21 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 11:18 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:18 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:12 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:12 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:40 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2151* - Log issue and disk filled up
  • 09:40 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2151* - Log issue and disk filled up
  • 09:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:04 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:37 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398044
  • 08:36 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 398044
  • 08:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46997
  • 08:34 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 46997
  • 08:33 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 394065
  • 08:33 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 394065
  • 08:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46562
  • 08:32 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 46562
  • 08:30 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13150
  • 08:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 13150
  • 08:28 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
  • 08:25 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 199524
  • 08:24 volans@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cirrussearch2113.codfw.wmnet
  • 08:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 63199
  • 08:23 volans@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2113.codfw.wmnet
  • 08:23 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 63199
  • 08:22 volans@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cirrussearch2113.codfw.wmnet
  • 08:22 volans@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2113.codfw.wmnet
  • 08:11 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cirrussearch2113.codfw.wmnet
  • 08:10 volans@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2113.codfw.wmnet
  • 07:52 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on cirrussearch2113.codfw.wmnet with reason: T394543
  • 06:15 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1157-1159].eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 5 - rack F1
  • 06:14 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1160-1162].eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 5 - rack F1
  • 06:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b3-magru
  • 06:13 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device asw1-b3-magru
  • 05:42 XioNoX: push pfw policies - T395904
  • 03:02 eevans@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 02:57 eevans@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw

2025-06-05

  • 22:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 22:04 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 22:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 22:02 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:58 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:58 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:57 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:42 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 21:00 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 20:59 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 20:57 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 20:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 20:56 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 20:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 20:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 20:49 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 20:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for Fix back compat for data-chart (T395462) (duration: 10m 05s)
  • 20:17 jdlrobson@deploy1003: jdlrobson: Continuing with sync
  • 20:16 jdlrobson@deploy1003: jdlrobson: Backport for Fix back compat for data-chart (T395462) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:14 jdlrobson@deploy1003: Started scap sync-world: Backport for Fix back compat for data-chart (T395462)
  • 20:09 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 19:24 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:22 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:15 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:14 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:14 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
  • 19:13 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
  • 19:13 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:12 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:12 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
  • 19:12 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
  • 18:52 phuedx: Disabled the SDS 2.4.11 Synthetic A/A Test in xLab
  • 18:49 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 18:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:43 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:32 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:30 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:21 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.4 refs T392174
  • 18:21 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:20 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
  • 18:20 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
  • 18:19 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:18 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 18:17 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 18:17 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250526/ using stat1011.eqiad.wmnet)
  • 18:17 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:17 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:05 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 18:04 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 17:28 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:27 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:25 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:24 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:24 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:24 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:21 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 17:21 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 17:15 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1013.eqiad.wmnet
  • 17:15 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1013.eqiad.wmnet
  • 16:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief2002.codfw.wmnet,acmechief1002.eqiad.wmnet,acmechief-test2001.codfw.wmnet,acmechief-test1001.eqiad.wmnet
  • 16:54 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for acmechief2002.codfw.wmnet,acmechief1002.eqiad.wmnet,acmechief-test2001.codfw.wmnet,acmechief-test1001.eqiad.wmnet
  • 16:51 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@930d28b]: adapt check_bad_parsing to dumps 2.0 (duration: 01m 16s)
  • 16:50 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@930d28b]: adapt check_bad_parsing to dumps 2.0
  • 16:50 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on acmechief2002.codfw.wmnet,acmechief1002.eqiad.wmnet,acmechief-test2001.codfw.wmnet,acmechief-test1001.eqiad.wmnet with reason: Reboots
  • 16:27 jdlrobson@deploy1003: Finished scap sync-world: Backport for Revert "Deploy survey to en at twenty percent" (duration: 11m 23s)
  • 16:20 jdlrobson@deploy1003: jdlrobson, jdrewniak: Continuing with sync
  • 16:18 jdlrobson@deploy1003: jdlrobson, jdrewniak: Backport for Revert "Deploy survey to en at twenty percent" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T395241)', diff saved to https://phabricator.wikimedia.org/P77191 and previous config saved to /var/cache/conftool/dbconfig/20250605-161701-fceratto.json
  • 16:16 jdlrobson@deploy1003: Started scap sync-world: Backport for Revert "Deploy survey to en at twenty percent"
  • 16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:02 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2244
  • 16:02 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2244
  • 16:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P77190 and previous config saved to /var/cache/conftool/dbconfig/20250605-160154-fceratto.json
  • 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2244 to codfw - jhancock@cumin2002"
  • 16:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2244 to codfw - jhancock@cumin2002"
  • 15:57 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:55 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on doc1003.eqiad.wmnet with reason: Bookworm Migration
  • 15:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P77189 and previous config saved to /var/cache/conftool/dbconfig/20250605-154647-fceratto.json
  • 15:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T395241)', diff saved to https://phabricator.wikimedia.org/P77188 and previous config saved to /var/cache/conftool/dbconfig/20250605-153139-fceratto.json
  • 15:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T395241)', diff saved to https://phabricator.wikimedia.org/P77187 and previous config saved to /var/cache/conftool/dbconfig/20250605-152314-fceratto.json
  • 15:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T395241)', diff saved to https://phabricator.wikimedia.org/P77186 and previous config saved to /var/cache/conftool/dbconfig/20250605-152248-fceratto.json
  • 15:08 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2004-dev.codfw.wmnet
  • 15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P77185 and previous config saved to /var/cache/conftool/dbconfig/20250605-150741-fceratto.json
  • 14:59 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2004-dev.codfw.wmnet
  • 14:59 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2003-dev.codfw.wmnet
  • 14:53 damilare: payments-wiki upgraded from 2d8b655a to aa102260
  • 14:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P77184 and previous config saved to /var/cache/conftool/dbconfig/20250605-145234-fceratto.json
  • 14:50 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
  • 14:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T395241)', diff saved to https://phabricator.wikimedia.org/P77183 and previous config saved to /var/cache/conftool/dbconfig/20250605-143724-fceratto.json
  • 14:29 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2236 (T395241)', diff saved to https://phabricator.wikimedia.org/P77182 and previous config saved to /var/cache/conftool/dbconfig/20250605-142908-fceratto.json
  • 14:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance
  • 14:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77181 and previous config saved to /var/cache/conftool/dbconfig/20250605-142840-fceratto.json
  • 14:20 taavi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'private.codfw.wikimedia.cloud$' on codfw recursors
  • 14:20 taavi@cumin1002: START - Cookbook sre.dns.wipe-cache 'private.codfw.wikimedia.cloud$' on codfw recursors
  • 14:19 tgr@deploy1003: Unlocked for deployment [MediaWiki]: T395468 (duration: 39m 39s)
  • 14:19 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: switching to katran
  • 14:18 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 14:18 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2007
  • 14:18 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2007
  • 14:17 taavi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:17 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign cloud-private v6 addresses for codfw1dev devices - taavi@cumin1002"
  • 14:17 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign cloud-private v6 addresses for codfw1dev devices - taavi@cumin1002"
  • 14:17 tgr: deploying a PrivateSettings config change
  • 14:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P77179 and previous config saved to /var/cache/conftool/dbconfig/20250605-141333-fceratto.json
  • 14:13 taavi@cumin1002: START - Cookbook sre.dns.netbox
  • 14:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77178 and previous config saved to /var/cache/conftool/dbconfig/20250605-141108-root.json
  • 13:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P77177 and previous config saved to /var/cache/conftool/dbconfig/20250605-135826-fceratto.json
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77176 and previous config saved to /var/cache/conftool/dbconfig/20250605-135603-root.json
  • 13:51 marostegui: Migrate s2 codfw to SBR dbmaint T383795
  • 13:50 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 13:50 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2037.codfw.wmnet
  • 13:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 13:49 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
  • 13:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 13:49 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 13:49 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 13:48 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 13:48 vgutierrez: upload liberica 0.16 to bookworm-wikimedia (apt.wm.o) - T395228
  • 13:48 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 13:47 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 13:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
  • 13:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77175 and previous config saved to /var/cache/conftool/dbconfig/20250605-134319-fceratto.json
  • 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77174 and previous config saved to /var/cache/conftool/dbconfig/20250605-134158-root.json
  • 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77173 and previous config saved to /var/cache/conftool/dbconfig/20250605-134153-root.json
  • 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77172 and previous config saved to /var/cache/conftool/dbconfig/20250605-134057-root.json
  • 13:40 moritzm: installing net-tools bugfix updates for bookworm
  • 13:40 tgr@deploy1003: Locking from deployment [MediaWiki]: T395468
  • 13:38 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2037.codfw.wmnet
  • 13:37 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
  • 13:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
  • 13:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77171 and previous config saved to /var/cache/conftool/dbconfig/20250605-133500-fceratto.json
  • 13:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 13:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T395241)', diff saved to https://phabricator.wikimedia.org/P77170 and previous config saved to /var/cache/conftool/dbconfig/20250605-133434-fceratto.json
  • 13:31 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
  • 13:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77169 and previous config saved to /var/cache/conftool/dbconfig/20250605-132652-root.json
  • 13:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77168 and previous config saved to /var/cache/conftool/dbconfig/20250605-132648-root.json
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77167 and previous config saved to /var/cache/conftool/dbconfig/20250605-132552-root.json
  • 13:23 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow7001.magru.wmnet
  • 13:23 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:23 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 13:21 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 13:21 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P77166 and previous config saved to /var/cache/conftool/dbconfig/20250605-131926-fceratto.json
  • 13:18 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
  • 13:17 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 13:16 gkyziridis@deploy1003: Finished scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (excluding azwiki) (T395823) (duration: 11m 51s)
  • 13:12 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts netflow7001.magru.wmnet
  • 13:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77165 and previous config saved to /var/cache/conftool/dbconfig/20250605-131147-root.json
  • 13:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77164 and previous config saved to /var/cache/conftool/dbconfig/20250605-131142-root.json
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77163 and previous config saved to /var/cache/conftool/dbconfig/20250605-131046-root.json
  • 13:09 gkyziridis@deploy1003: gkyziridis: Continuing with sync
  • 13:07 gkyziridis@deploy1003: gkyziridis: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (excluding azwiki) (T395823) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:04 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (excluding azwiki) (T395823)
  • 13:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P77162 and previous config saved to /var/cache/conftool/dbconfig/20250605-130419-fceratto.json
  • 13:03 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "remove outdated octavia net - taavi@cumin1002"
  • 13:03 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "remove outdated octavia net - taavi@cumin1002"
  • 12:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77161 and previous config saved to /var/cache/conftool/dbconfig/20250605-125641-root.json
  • 12:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77160 and previous config saved to /var/cache/conftool/dbconfig/20250605-125637-root.json
  • 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77159 and previous config saved to /var/cache/conftool/dbconfig/20250605-125540-root.json
  • 12:54 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2035.codfw.wmnet
  • 12:54 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
  • 12:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2151 T395989', diff saved to https://phabricator.wikimedia.org/P77158 and previous config saved to /var/cache/conftool/dbconfig/20250605-125057-marostegui.json
  • 12:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T395241)', diff saved to https://phabricator.wikimedia.org/P77157 and previous config saved to /var/cache/conftool/dbconfig/20250605-124912-fceratto.json
  • 12:48 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
  • 12:43 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2035.codfw.wmnet
  • 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77156 and previous config saved to /var/cache/conftool/dbconfig/20250605-124136-root.json
  • 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77155 and previous config saved to /var/cache/conftool/dbconfig/20250605-124131-root.json
  • 12:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T395241)', diff saved to https://phabricator.wikimedia.org/P77154 and previous config saved to /var/cache/conftool/dbconfig/20250605-124110-fceratto.json
  • 12:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 12:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77153 and previous config saved to /var/cache/conftool/dbconfig/20250605-124043-fceratto.json
  • 12:32 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:32 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 12:31 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:30 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 12:30 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:30 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77151 and previous config saved to /var/cache/conftool/dbconfig/20250605-122631-root.json
  • 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77150 and previous config saved to /var/cache/conftool/dbconfig/20250605-122625-root.json
  • 12:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P77149 and previous config saved to /var/cache/conftool/dbconfig/20250605-122537-fceratto.json
  • 12:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es[2042,2045].codfw.wmnet with reason: Maintenance
  • 12:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2042 es2045 T395241', diff saved to https://phabricator.wikimedia.org/P77147 and previous config saved to /var/cache/conftool/dbconfig/20250605-122035-marostegui.json
  • 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P77146 and previous config saved to /var/cache/conftool/dbconfig/20250605-121029-fceratto.json
  • 11:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77145 and previous config saved to /var/cache/conftool/dbconfig/20250605-115522-fceratto.json
  • 11:49 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 11:48 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 11:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77144 and previous config saved to /var/cache/conftool/dbconfig/20250605-114711-fceratto.json
  • 11:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 11:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 11:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T395241)', diff saved to https://phabricator.wikimedia.org/P77143 and previous config saved to /var/cache/conftool/dbconfig/20250605-114213-fceratto.json
  • 11:35 moritzm: installing Linux 5.10.237 on Bullseye hosts
  • 11:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P77142 and previous config saved to /var/cache/conftool/dbconfig/20250605-112706-fceratto.json
  • 11:26 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
  • 11:25 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
  • 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77141 and previous config saved to /var/cache/conftool/dbconfig/20250605-112518-root.json
  • 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77140 and previous config saved to /var/cache/conftool/dbconfig/20250605-112511-root.json
  • 11:19 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
  • 11:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P77139 and previous config saved to /var/cache/conftool/dbconfig/20250605-111158-fceratto.json
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77138 and previous config saved to /var/cache/conftool/dbconfig/20250605-111013-root.json
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77137 and previous config saved to /var/cache/conftool/dbconfig/20250605-111005-root.json
  • 11:09 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
  • 11:04 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
  • 11:03 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250526/ using stat1011.eqiad.wmnet)
  • 11:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
  • 11:02 gehel: restarting Blazegraph on wdqs1023 to address allocator decreasing alert
  • 10:57 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
  • 10:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T395241)', diff saved to https://phabricator.wikimedia.org/P77136 and previous config saved to /var/cache/conftool/dbconfig/20250605-105650-fceratto.json
  • 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77135 and previous config saved to /var/cache/conftool/dbconfig/20250605-105507-root.json
  • 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77134 and previous config saved to /var/cache/conftool/dbconfig/20250605-105500-root.json
  • 10:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77133 and previous config saved to /var/cache/conftool/dbconfig/20250605-105216-root.json
  • 10:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T395241)', diff saved to https://phabricator.wikimedia.org/P77132 and previous config saved to /var/cache/conftool/dbconfig/20250605-104928-fceratto.json
  • 10:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 10:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T395241)', diff saved to https://phabricator.wikimedia.org/P77131 and previous config saved to /var/cache/conftool/dbconfig/20250605-104912-fceratto.json
  • 10:42 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
  • 10:41 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2030.codfw.wmnet
  • 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77130 and previous config saved to /var/cache/conftool/dbconfig/20250605-104002-root.json
  • 10:39 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77129 and previous config saved to /var/cache/conftool/dbconfig/20250605-103954-root.json
  • 10:39 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1165.eqiad.wmnet
  • 10:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77128 and previous config saved to /var/cache/conftool/dbconfig/20250605-103711-root.json
  • 10:34 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 10:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P77127 and previous config saved to /var/cache/conftool/dbconfig/20250605-103403-fceratto.json
  • 10:33 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
  • 10:32 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1165.eqiad.wmnet
  • 10:31 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1164.eqiad.wmnet
  • 10:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 10:30 Ammar: Ran fixStuckGlobalRename.php for T396054
  • 10:27 claime: Manual run of generatecaptcha on mw-cron with delete - T388531
  • 10:26 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 10:26 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77126 and previous config saved to /var/cache/conftool/dbconfig/20250605-102456-root.json
  • 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77125 and previous config saved to /var/cache/conftool/dbconfig/20250605-102449-root.json
  • 10:24 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1164.eqiad.wmnet
  • 10:23 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1163.eqiad.wmnet
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2041 to es4 master and es2044 as es5 master', diff saved to https://phabricator.wikimedia.org/P77124 and previous config saved to /var/cache/conftool/dbconfig/20250605-102319-root.json
  • 10:22 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2029.codfw.wmnet
  • 10:22 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
  • 10:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77123 and previous config saved to /var/cache/conftool/dbconfig/20250605-102205-root.json
  • 10:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 10:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P77122 and previous config saved to /var/cache/conftool/dbconfig/20250605-101856-fceratto.json
  • 10:16 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
  • 10:16 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1163.eqiad.wmnet
  • 10:09 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77121 and previous config saved to /var/cache/conftool/dbconfig/20250605-100950-root.json
  • 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77120 and previous config saved to /var/cache/conftool/dbconfig/20250605-100943-root.json
  • 10:08 cgoubert@deploy1003: Finished scap sync-world: Backport for Revert "ores-extension: enable extension with revertrisk filter for second batch of wikis" (duration: 10m 36s)
  • 10:08 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
  • 10:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77119 and previous config saved to /var/cache/conftool/dbconfig/20250605-100700-root.json
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77118 and previous config saved to /var/cache/conftool/dbconfig/20250605-100527-root.json
  • 10:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es[2043,2046].codfw.wmnet with reason: Maintenance
  • 10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2043 es2046 T395241', diff saved to https://phabricator.wikimedia.org/P77117 and previous config saved to /var/cache/conftool/dbconfig/20250605-100419-marostegui.json
  • 10:04 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@d11bd51]: Update webrequest-test hive jar for ua-parser (duration: 00m 16s)
  • 10:03 joal@deploy1003: Started deploy [airflow-dags/analytics_test@d11bd51]: Update webrequest-test hive jar for ua-parser
  • 10:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T395241)', diff saved to https://phabricator.wikimedia.org/P77116 and previous config saved to /var/cache/conftool/dbconfig/20250605-100349-fceratto.json
  • 10:03 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
  • 10:01 cgoubert@deploy1003: gkyziridis, cgoubert: Continuing with sync
  • 10:00 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
  • 10:00 cgoubert@deploy1003: gkyziridis, cgoubert: Backport for Revert "ores-extension: enable extension with revertrisk filter for second batch of wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:58 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
  • 09:58 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1007.eqiad.wmnet
  • 09:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
  • 09:57 cgoubert@deploy1003: Started scap sync-world: Backport for Revert "ores-extension: enable extension with revertrisk filter for second batch of wikis"
  • 09:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T395241)', diff saved to https://phabricator.wikimedia.org/P77115 and previous config saved to /var/cache/conftool/dbconfig/20250605-095415-fceratto.json
  • 09:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 09:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T395241)', diff saved to https://phabricator.wikimedia.org/P77114 and previous config saved to /var/cache/conftool/dbconfig/20250605-095347-fceratto.json
  • 09:53 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 09:53 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 09:52 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77113 and previous config saved to /var/cache/conftool/dbconfig/20250605-095155-root.json
  • 09:50 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1007.eqiad.wmnet
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77112 and previous config saved to /var/cache/conftool/dbconfig/20250605-095022-root.json
  • 09:44 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
  • 09:44 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 09:43 claime: Re-enabling CPU/RAM limits on mw-cron - T395436
  • 09:43 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 09:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P77111 and previous config saved to /var/cache/conftool/dbconfig/20250605-093840-fceratto.json
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77110 and previous config saved to /var/cache/conftool/dbconfig/20250605-093649-root.json
  • 09:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:36 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
  • 09:36 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
  • 09:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:35 hnowlan: Migrate reading lists API out of restbase for group1 via rest-gateway
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77109 and previous config saved to /var/cache/conftool/dbconfig/20250605-093515-root.json
  • 09:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2158 T395989', diff saved to https://phabricator.wikimedia.org/P77108 and previous config saved to /var/cache/conftool/dbconfig/20250605-093107-marostegui.json
  • 09:30 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
  • 09:26 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
  • 09:25 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add netflow7002 - jmm@cumin1003"
  • 09:24 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2026.codfw.wmnet
  • 09:24 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add netflow7002 - jmm@cumin1003"
  • 09:24 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
  • 09:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P77107 and previous config saved to /var/cache/conftool/dbconfig/20250605-092333-fceratto.json
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77106 and previous config saved to /var/cache/conftool/dbconfig/20250605-092010-root.json
  • 09:18 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 09:17 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2004.codfw.wmnet
  • 09:13 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ping2004.codfw.wmnet
  • 09:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T395241)', diff saved to https://phabricator.wikimedia.org/P77105 and previous config saved to /var/cache/conftool/dbconfig/20250605-090825-fceratto.json
  • 09:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77104 and previous config saved to /var/cache/conftool/dbconfig/20250605-090504-root.json
  • 08:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T395241)', diff saved to https://phabricator.wikimedia.org/P77103 and previous config saved to /var/cache/conftool/dbconfig/20250605-085847-fceratto.json
  • 08:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 08:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T395241)', diff saved to https://phabricator.wikimedia.org/P77102 and previous config saved to /var/cache/conftool/dbconfig/20250605-085820-fceratto.json
  • 08:53 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2026.codfw.wmnet
  • 08:50 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
  • 08:50 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77101 and previous config saved to /var/cache/conftool/dbconfig/20250605-084959-root.json
  • 08:47 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1004.eqiad.wmnet
  • 08:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2169 T395989', diff saved to https://phabricator.wikimedia.org/P77100 and previous config saved to /var/cache/conftool/dbconfig/20250605-084557-marostegui.json
  • 08:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 08:43 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ping1004.eqiad.wmnet
  • 08:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P77099 and previous config saved to /var/cache/conftool/dbconfig/20250605-084313-fceratto.json
  • 08:35 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:31 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
  • 08:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P77098 and previous config saved to /var/cache/conftool/dbconfig/20250605-082806-fceratto.json
  • 08:24 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow7002.magru.wmnet
  • 08:24 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow7002.magru.wmnet with OS bookworm
  • 08:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T395241)', diff saved to https://phabricator.wikimedia.org/P77097 and previous config saved to /var/cache/conftool/dbconfig/20250605-081258-fceratto.json
  • 08:10 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
  • 08:09 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
  • 08:06 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow7002.magru.wmnet with reason: host reimage
  • 08:03 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow7002.magru.wmnet with reason: host reimage
  • 08:03 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 08:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T395241)', diff saved to https://phabricator.wikimedia.org/P77096 and previous config saved to /var/cache/conftool/dbconfig/20250605-080310-fceratto.json
  • 08:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 08:00 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 08:00 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 07:56 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
  • 07:50 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 07:49 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
  • 07:44 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:43 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:43 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
  • 07:38 gkyziridis@deploy1003: Sync cancelled.
  • 07:36 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host netflow7002.magru.wmnet with OS bookworm
  • 07:36 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 07:32 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2022.codfw.wmnet
  • 07:32 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
  • 07:27 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow7002.magru.wmnet - jmm@cumin1003"
  • 07:27 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow7002.magru.wmnet - jmm@cumin1003"
  • 07:26 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow7002.magru.wmnet on all recursors
  • 07:26 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache netflow7002.magru.wmnet on all recursors
  • 07:26 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:26 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow7002.magru.wmnet - jmm@cumin1003"
  • 07:26 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow7002.magru.wmnet - jmm@cumin1003"
  • 07:26 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
  • 07:23 gkyziridis@deploy1003: gkyziridis: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (T395823) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:22 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 07:22 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host netflow7002.magru.wmnet
  • 07:21 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (T395823)
  • 07:19 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2022.codfw.wmnet
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77095 and previous config saved to /var/cache/conftool/dbconfig/20250605-064137-root.json
  • 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77094 and previous config saved to /var/cache/conftool/dbconfig/20250605-062629-root.json
  • 06:19 marostegui: Change datadir on pc8 dbmaint eqiad codfw T395983
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc8 T395983', diff saved to https://phabricator.wikimedia.org/P77093 and previous config saved to /var/cache/conftool/dbconfig/20250605-061929-marostegui.json
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc8 T395983', diff saved to https://phabricator.wikimedia.org/P77092 and previous config saved to /var/cache/conftool/dbconfig/20250605-061612-marostegui.json
  • 06:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2018.codfw.wmnet,pc1018.eqiad.wmnet with reason: Maintenance
  • 06:15 marostegui: Change datadir on pc7 dbmaint eqiad codfw T395983
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc7 T395983', diff saved to https://phabricator.wikimedia.org/P77091 and previous config saved to /var/cache/conftool/dbconfig/20250605-061502-marostegui.json
  • 06:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc7 T395983', diff saved to https://phabricator.wikimedia.org/P77090 and previous config saved to /var/cache/conftool/dbconfig/20250605-061200-marostegui.json
  • 06:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2017.codfw.wmnet,pc1017.eqiad.wmnet with reason: Maintenance
  • 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77089 and previous config saved to /var/cache/conftool/dbconfig/20250605-061124-root.json
  • 05:57 marostegui: Change datadir on pc6 dbmaint eqiad codfw T395983
  • 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc6 T395983', diff saved to https://phabricator.wikimedia.org/P77088 and previous config saved to /var/cache/conftool/dbconfig/20250605-055655-marostegui.json
  • 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77087 and previous config saved to /var/cache/conftool/dbconfig/20250605-055619-root.json
  • 05:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2016.codfw.wmnet,pc1016.eqiad.wmnet with reason: Maintenance
  • 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc6 T395983', diff saved to https://phabricator.wikimedia.org/P77086 and previous config saved to /var/cache/conftool/dbconfig/20250605-055438-marostegui.json
  • 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc5 T395983', diff saved to https://phabricator.wikimedia.org/P77085 and previous config saved to /var/cache/conftool/dbconfig/20250605-055349-marostegui.json
  • 05:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2015.codfw.wmnet,pc1015.eqiad.wmnet with reason: Maintenance
  • 05:51 marostegui: Change datadir on pc5 dbmaint eqiad codfw T395983
  • 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc5 T395983', diff saved to https://phabricator.wikimedia.org/P77084 and previous config saved to /var/cache/conftool/dbconfig/20250605-055121-marostegui.json
  • 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc4 T395983', diff saved to https://phabricator.wikimedia.org/P77083 and previous config saved to /var/cache/conftool/dbconfig/20250605-055013-marostegui.json
  • 05:48 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2014.codfw.wmnet,pc1014.eqiad.wmnet with reason: Maintenance
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc4 T395983', diff saved to https://phabricator.wikimedia.org/P77082 and previous config saved to /var/cache/conftool/dbconfig/20250605-054806-marostegui.json
  • 05:47 marostegui: Change datadir on pc4 dbmaint eqiad codfw T395983
  • 05:47 marostegui: Change datadir on pc3 dbmaint eqiad codfw T395983
  • 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77081 and previous config saved to /var/cache/conftool/dbconfig/20250605-054113-root.json
  • 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc3 T395983', diff saved to https://phabricator.wikimedia.org/P77080 and previous config saved to /var/cache/conftool/dbconfig/20250605-053647-marostegui.json
  • 05:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc3 T395983', diff saved to https://phabricator.wikimedia.org/P77079 and previous config saved to /var/cache/conftool/dbconfig/20250605-053317-marostegui.json
  • 05:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc2 T395983', diff saved to https://phabricator.wikimedia.org/P77078 and previous config saved to /var/cache/conftool/dbconfig/20250605-052934-marostegui.json
  • 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77077 and previous config saved to /var/cache/conftool/dbconfig/20250605-052604-root.json
  • 05:25 marostegui: Change datadir on pc2 dbmaint eqiad codfw T395983
  • 05:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Maintenance
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc2 T395983', diff saved to https://phabricator.wikimedia.org/P77076 and previous config saved to /var/cache/conftool/dbconfig/20250605-052442-marostegui.json
  • 05:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2180 T395989', diff saved to https://phabricator.wikimedia.org/P77075 and previous config saved to /var/cache/conftool/dbconfig/20250605-052003-marostegui.json

2025-06-04

  • 23:55 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir
  • 22:45 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir
  • 22:30 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7016.magru.wmnet
  • 22:27 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 22:20 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7014.magru.wmnet
  • 22:18 damilare: SmashPig upgraded from d08693e5 to 3222a1f3
  • 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump cache key version in EventStore (T396075) (duration: 13m 54s)
  • 22:12 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7016.magru.wmnet
  • 22:12 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7015.magru.wmnet
  • 22:12 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7015.magru.wmnet
  • 22:11 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7015.magru.wmnet
  • 22:11 brett: sudo -i cumin 'A:ncredir' 'depool && apt-get update && apt-get upgrade -y && pool' -b1 -s10
  • 22:09 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 22:04 ladsgroup@deploy1003: ladsgroup: Backport for Bump cache key version in EventStore (T396075) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:02 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump cache key version in EventStore (T396075)
  • 22:02 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7015.magru.wmnet
  • 22:02 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7014.magru.wmnet
  • 22:02 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7013.magru.wmnet
  • 21:58 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7012.magru.wmnet
  • 21:43 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7013.magru.wmnet
  • 21:42 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7011.magru.wmnet
  • 21:40 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7012.magru.wmnet
  • 21:40 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7010.magru.wmnet
  • 21:39 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7010.magru.wmnet
  • 21:35 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7010.magru.wmnet
  • 21:29 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250526/ using stat1011.eqiad.wmnet)
  • 21:25 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7011.magru.wmnet
  • 21:25 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7010.magru.wmnet
  • 21:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20250526/ using stat1009.eqiad.wmnet)
  • 21:22 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7009.magru.wmnet
  • 21:14 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7008.magru.wmnet
  • 21:07 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 21:06 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:05 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7009.magru.wmnet
  • 21:05 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7008.magru.wmnet
  • 21:04 cjming: end of UTC late backport window
  • 21:04 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7007.magru.wmnet
  • 21:02 cjming@deploy1003: Finished scap sync-world: Backport for SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure… (follow-ups) (T390784), SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure… (follow-ups) (T390784) (d
  • 21:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 20:55 cjming@deploy1003: matmarex, cjming: Continuing with sync
  • 20:55 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:54 cjming@deploy1003: matmarex, cjming: Backport for SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure… (follow-ups) (T390784), SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure… (follow-ups) (T390784) synced to
  • 20:51 cjming@deploy1003: Started scap sync-world: Backport for SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure… (follow-ups) (T390784), SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure… (follow-ups) (T390784)
  • 20:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 20:50 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7006.magru.wmnet
  • 20:46 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7007.magru.wmnet
  • 20:44 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7005.magru.wmnet
  • 20:40 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7006.magru.wmnet
  • 20:38 cjming@deploy1003: Finished scap sync-world: Backport for Treat File::getShortDesc() as possibly unsafe HTML (T395834), Treat File::getShortDesc() as possibly unsafe HTML (T395834) (duration: 15m 37s)
  • 20:37 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7004.magru.wmnet
  • 20:31 cjming@deploy1003: cjming, matmarex: Continuing with sync
  • 20:26 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7005.magru.wmnet
  • 20:26 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7004.magru.wmnet
  • 20:25 cjming@deploy1003: cjming, matmarex: Backport for Treat File::getShortDesc() as possibly unsafe HTML (T395834), Treat File::getShortDesc() as possibly unsafe HTML (T395834) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:23 cjming@deploy1003: Started scap sync-world: Backport for Treat File::getShortDesc() as possibly unsafe HTML (T395834), Treat File::getShortDesc() as possibly unsafe HTML (T395834)
  • 20:20 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7003.magru.wmnet
  • 20:18 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7002.magru.wmnet
  • 20:15 cjming@deploy1003: Finished scap sync-world: Backport for beta cluster: Set $wgOATHAuthAccountPrefix (T396061) (duration: 10m 13s)
  • 20:10 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7003.magru.wmnet
  • 20:09 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7002.magru.wmnet
  • 20:08 cjming@deploy1003: lucaswerkmeister, cjming: Continuing with sync
  • 20:07 cjming@deploy1003: lucaswerkmeister, cjming: Backport for beta cluster: Set $wgOATHAuthAccountPrefix (T396061) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:05 cjming@deploy1003: Started scap sync-world: Backport for beta cluster: Set $wgOATHAuthAccountPrefix (T396061)
  • 20:03 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp3072.esams.wmnet
  • 19:54 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3072.esams.wmnet
  • 19:42 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7001.magru.wmnet
  • 19:36 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 19:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7001.magru.wmnet
  • 19:23 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:22 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 19:13 sukhe@dns1004: END - running authdns-update
  • 19:12 sukhe@dns1004: START - running authdns-update
  • 19:11 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7002.wikimedia.org [reason: repooling after reboot]
  • 19:11 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7002.magru.wmnet [reason: repooling after reboot]
  • 19:11 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns7002.wikimedia.org
  • 19:11 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns7002.wikimedia.org
  • 19:10 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:magru and (A:dnsbox)
  • 19:10 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7002.wikimedia.org
  • 19:10 bvibber@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 19:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 19:10 bvibber@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 19:10 dreamyjazz@deploy1003: Finished scap sync-world: Backport for CustomBlockedDomainStorage::fetchConfig: Cast LinkTarget to a Title for RevisionStore (T396056) (duration: 12m 27s)
  • 19:10 bvibber@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 19:09 bvibber@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 19:09 bvibber@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 19:08 bvibber@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 19:03 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 19:00 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 19:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:00 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:59 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:59 dreamyjazz@deploy1003: dreamyjazz: Backport for CustomBlockedDomainStorage::fetchConfig: Cast LinkTarget to a Title for RevisionStore (T396056) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:57 dreamyjazz@deploy1003: Started scap sync-world: Backport for CustomBlockedDomainStorage::fetchConfig: Cast LinkTarget to a Title for RevisionStore (T396056)
  • 18:55 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7002.wikimedia.org
  • 18:51 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.*
  • 18:49 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7001.wikimedia.org
  • 18:37 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7001.*
  • 18:37 brett: depooling cp7001 for CPU stress testing and temperature effects (T373993)
  • 18:29 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7001.wikimedia.org
  • 18:29 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:magru and (A:dnsbox)
  • 18:24 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 18:24 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 18:18 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.4 refs T392174
  • 18:16 damilare: SmashPig upgraded from a99f2265 to d08693e5
  • 18:15 sukhe: puppet re-enabled on A:cp and finished rolling out removal of ats-be from single backend cp nodes: T288106
  • 18:14 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:13 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:03 bvibber@deploy1003: Finished scap sync-world: Backport for Update Charts so they can render from data-mw-charts as well as data-charts (T395462), Update Charts so they can render from data-mw-charts as well as data-charts (T395462) (duration: 10m 05s)
  • 17:56 bvibber@deploy1003: bvibber: Continuing with sync
  • 17:55 bvibber@deploy1003: bvibber: Backport for Update Charts so they can render from data-mw-charts as well as data-charts (T395462), Update Charts so they can render from data-mw-charts as well as data-charts (T395462) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:53 bvibber@deploy1003: Started scap sync-world: Backport for Update Charts so they can render from data-mw-charts as well as data-charts (T395462), Update Charts so they can render from data-mw-charts as well as data-charts (T395462)
  • 17:50 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: sync
  • 17:49 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: sync
  • 17:48 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:48 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 17:36 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:35 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:15 cgoubert@deploy1003: Finished scap sync-world: 1153647: mediawiki: Fix captcha configmap structure - T388531 (duration: 02m 39s)
  • 17:13 cgoubert@deploy1003: Started scap sync-world: 1153647: mediawiki: Fix captcha configmap structure - T388531
  • 16:41 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
  • 16:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:32 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:30 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
  • 16:30 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
  • 16:27 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1165.eqiad.wmnet
  • 16:27 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1165.eqiad.wmnet
  • 16:22 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:22 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
  • 16:22 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
  • 16:19 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1164.eqiad.wmnet
  • 16:18 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1164.eqiad.wmnet
  • 16:18 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1165.eqiad.wmnet
  • 16:16 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1165.eqiad.wmnet
  • 16:14 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 16:14 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1164.eqiad.wmnet
  • 16:13 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1164.eqiad.wmnet
  • 16:13 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:12 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1164.eqiad.wmnet
  • 16:12 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling reboot on A:durum and not A:magru and A:durum
  • 16:12 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:11 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and not A:magru and A:wikidough
  • 16:10 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1164.eqiad.wmnet
  • 16:07 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1163.eqiad.wmnet
  • 16:04 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:03 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1163.eqiad.wmnet
  • 16:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T395241)', diff saved to https://phabricator.wikimedia.org/P77074 and previous config saved to /var/cache/conftool/dbconfig/20250604-160120-fceratto.json
  • 15:56 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:56 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P77073 and previous config saved to /var/cache/conftool/dbconfig/20250604-154611-fceratto.json
  • 15:42 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 15:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P77072 and previous config saved to /var/cache/conftool/dbconfig/20250604-153104-fceratto.json
  • 15:30 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 15:29 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 15:24 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Set wgCheckUserDisableCheckUserAPI to false on loginwiki (T396010) (duration: 10m 03s)
  • 15:20 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
  • 15:20 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 15:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:18 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:17 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 15:16 dreamyjazz@deploy1003: dreamyjazz: Backport for Set wgCheckUserDisableCheckUserAPI to false on loginwiki (T396010) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T395241)', diff saved to https://phabricator.wikimedia.org/P77071 and previous config saved to /var/cache/conftool/dbconfig/20250604-151556-fceratto.json
  • 15:15 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling reboot on A:durum and not A:magru and A:durum
  • 15:14 dreamyjazz@deploy1003: Started scap sync-world: Backport for Set wgCheckUserDisableCheckUserAPI to false on loginwiki (T396010)
  • 15:14 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and not A:magru and A:wikidough
  • 15:14 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:08 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T395241)', diff saved to https://phabricator.wikimedia.org/P77070 and previous config saved to /var/cache/conftool/dbconfig/20250604-150740-fceratto.json
  • 15:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T395241)', diff saved to https://phabricator.wikimedia.org/P77069 and previous config saved to /var/cache/conftool/dbconfig/20250604-150716-fceratto.json
  • 15:05 jiji@deploy1003: Finished scap sync-world: T276994: Chart bump, noop (duration: 02m 52s)
  • 15:04 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:02 jiji@deploy1003: Started scap sync-world: T276994: Chart bump, noop
  • 15:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:00 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:00 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1163.eqiad.wmnet
  • 14:58 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1163.eqiad.wmnet
  • 14:58 moritzm: installing Linux 5.10.237 on Bullseye hosts
  • 14:55 cmooney@dns2005: END - running authdns-update
  • 14:54 cmooney@dns2005: START - running authdns-update
  • 14:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P77068 and previous config saved to /var/cache/conftool/dbconfig/20250604-145209-fceratto.json
  • 14:50 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove DNS entries for IPs used in Nokia test lab codfw - cmooney@cumin1002"
  • 14:49 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove DNS entries for IPs used in Nokia test lab codfw - cmooney@cumin1002"
  • 14:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1094.eqiad.wmnet with OS bullseye
  • 14:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:47 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2020.codfw.wmnet
  • 14:47 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
  • 14:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:46 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 14:46 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 14:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1095.eqiad.wmnet with OS bullseye
  • 14:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:43 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 14:41 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
  • 14:38 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:37 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P77067 and previous config saved to /var/cache/conftool/dbconfig/20250604-143702-fceratto.json
  • 14:36 cgoubert@deploy1003: Finished scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531 (duration: 02m 33s)
  • 14:33 cgoubert@deploy1003: Started scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531
  • 14:32 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:32 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:31 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2020.codfw.wmnet
  • 14:31 cgoubert@deploy1003: Finished scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531 (duration: 02m 24s)
  • 14:29 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2019.codfw.wmnet
  • 14:29 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
  • 14:28 cgoubert@deploy1003: Started scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531
  • 14:28 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:27 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:27 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:26 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:26 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:25 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:24 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:24 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh7001.wikimedia.org
  • 14:24 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:24 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh7001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1002"
  • 14:23 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh7001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1002"
  • 14:23 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:23 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
  • 14:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T395241)', diff saved to https://phabricator.wikimedia.org/P77066 and previous config saved to /var/cache/conftool/dbconfig/20250604-142155-fceratto.json
  • 14:21 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:21 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:20 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 14:19 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum7001.magru.wmnet
  • 14:19 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:19 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin1002"
  • 14:19 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin1002"
  • 14:19 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
  • 14:19 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
  • 14:17 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 14:16 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 14:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
  • 14:13 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2019.codfw.wmnet
  • 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T395241)', diff saved to https://phabricator.wikimedia.org/P77065 and previous config saved to /var/cache/conftool/dbconfig/20250604-141238-fceratto.json
  • 14:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T395241)', diff saved to https://phabricator.wikimedia.org/P77064 and previous config saved to /var/cache/conftool/dbconfig/20250604-141213-fceratto.json
  • 14:11 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 14:11 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 14:08 sukhe: decommissioning doh7001 and durum7001: T396015
  • 14:07 sukhe@cumin1002: START - Cookbook sre.hosts.decommission for hosts doh7001.wikimedia.org
  • 14:07 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:07 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:07 sukhe@cumin1002: START - Cookbook sre.hosts.decommission for hosts durum7001.magru.wmnet
  • 14:06 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:06 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:04 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 14:04 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 14:02 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:02 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1094.eqiad.wmnet with reason: host reimage
  • 13:58 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 13:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P77062 and previous config saved to /var/cache/conftool/dbconfig/20250604-135706-fceratto.json
  • 13:56 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1094.eqiad.wmnet with reason: host reimage
  • 13:54 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 13:51 claime: Manual run of generatecaptcha on mw-cron, no delete - T388531
  • 13:51 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 13:50 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 13:49 sukhe: sudo cumin -b1 -s15 'A:cp' 'run-puppet-agent --enable "merging CR 1114074"': T288106
  • 13:48 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 13:48 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
  • 13:47 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
  • 13:46 sukhe: forcing ats-backend-restart on cp1104
  • 13:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1095.eqiad.wmnet with reason: host reimage
  • 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77061 and previous config saved to /var/cache/conftool/dbconfig/20250604-134336-root.json
  • 13:43 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
  • 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P77060 and previous config saved to /var/cache/conftool/dbconfig/20250604-134158-fceratto.json
  • 13:41 tappof@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host grafana2001.codfw.wmnet
  • 13:41 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
  • 13:40 tappof@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host grafana2001.codfw.wmnet
  • 13:40 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
  • 13:40 samtar@deploy1003: Finished scap sync-world: Backport for IS: Undo turning on wgTemplateDataEnableCategoryBrowser for mw.org (T377975) (duration: 09m 57s)
  • 13:39 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1095.eqiad.wmnet with reason: host reimage
  • 13:38 tappof@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host grafana2001.codfw.wmnet
  • 13:38 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
  • 13:37 sukhe: forcing agent run on cp2037 (non-single BE node): CR 1114074
  • 13:37 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1094.eqiad.wmnet with OS bullseye
  • 13:33 samtar@deploy1003: samtar: Continuing with sync
  • 13:32 samtar@deploy1003: samtar: Backport for IS: Undo turning on wgTemplateDataEnableCategoryBrowser for mw.org (T377975) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:30 sukhe: forcing agent run on cp7001 (single BE node): CR 1114074
  • 13:30 samtar@deploy1003: Started scap sync-world: Backport for IS: Undo turning on wgTemplateDataEnableCategoryBrowser for mw.org (T377975)
  • 13:29 sukhe: forcing agent run on cp6015: CR 1114074
  • 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77059 and previous config saved to /var/cache/conftool/dbconfig/20250604-132829-root.json
  • 13:27 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1013.eqiad.wmnet
  • 13:27 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1013.eqiad.wmnet
  • 13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T395241)', diff saved to https://phabricator.wikimedia.org/P77058 and previous config saved to /var/cache/conftool/dbconfig/20250604-132648-fceratto.json
  • 13:23 sukhe: starting removal of ats-be service from eqiad, eqsin, esams, magru, ulsfo: T288106
  • 13:21 sukhe: sudo cumin 'A:cp' 'disable-puppet "merging CR 1114074"'
  • 13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1095.eqiad.wmnet with OS bullseye
  • 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T395241)', diff saved to https://phabricator.wikimedia.org/P77057 and previous config saved to /var/cache/conftool/dbconfig/20250604-131852-fceratto.json
  • 13:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T395241)', diff saved to https://phabricator.wikimedia.org/P77056 and previous config saved to /var/cache/conftool/dbconfig/20250604-131827-fceratto.json
  • 13:14 jforrester@deploy1003: Finished scap sync-world: Backport for release CampaignEvents to cbk-zam wiki (T393604), Bump portals to the 2025-06-02 09:23:11+00:00 build (T128546), build: Rename the rarely-used 'typos' script to 'checkTypos', Drop Chart roll-out dblists, no longer needed (T383079) (duration: 10m 29s)
  • 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77055 and previous config saved to /var/cache/conftool/dbconfig/20250604-131323-root.json
  • 13:11 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 13:11 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 13:08 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 13:07 jforrester@deploy1003: jforrester, mhorsey: Continuing with sync
  • 13:06 jforrester@deploy1003: jforrester, mhorsey: Backport for release CampaignEvents to cbk-zam wiki (T393604), Bump portals to the 2025-06-02 09:23:11+00:00 build (T128546), build: Rename the rarely-used 'typos' script to 'checkTypos', Drop Chart roll-out dblists, no longer needed (T383079) synced to the testservers (see https://wikitech.wikimedia
  • 13:04 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti7001.magru.wmnet to cluster magru03 and group B
  • 13:04 jforrester@deploy1003: Started scap sync-world: Backport for release CampaignEvents to cbk-zam wiki (T393604), Bump portals to the 2025-06-02 09:23:11+00:00 build (T128546), build: Rename the rarely-used 'typos' script to 'checkTypos', Drop Chart roll-out dblists, no longer needed (T383079)
  • 13:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P77054 and previous config saved to /var/cache/conftool/dbconfig/20250604-130319-fceratto.json
  • 13:03 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru03 and group B
  • 13:02 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:02 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:02 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:01 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:01 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
  • 12:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77053 and previous config saved to /var/cache/conftool/dbconfig/20250604-125817-root.json
  • 12:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77051 and previous config saved to /var/cache/conftool/dbconfig/20250604-124311-root.json
  • 12:43 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1004.eqiad.wmnet
  • 12:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1003.eqiad.wmnet
  • 12:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:39 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7001.magru.wmnet with OS bookworm
  • 12:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 12:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-be1095.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:36 moritzm: installing modsecurity-apache security updates
  • 12:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-be1094.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:35 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1003.eqiad.wmnet
  • 12:35 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:35 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for ms-be1094/95 - jclark@cumin1002"
  • 12:35 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for ms-be1094/95 - jclark@cumin1002"
  • 12:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
  • 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T395241)', diff saved to https://phabricator.wikimedia.org/P77050 and previous config saved to /var/cache/conftool/dbconfig/20250604-123304-fceratto.json
  • 12:32 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77049 and previous config saved to /var/cache/conftool/dbconfig/20250604-122948-root.json
  • {{safesubst:SAL entry|1=12:28 reedy@deploy1003: Finished scap sync-world: Backport for GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring (T388531), captcha.py: Expand variables and user in filenames (T395810), captcha.py: Check if output dir exists, and attempt to create it (else error) (T395804), [[gerrit:1153595|captcha.py: Bail out if no words were rea}}
  • 12:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77048 and previous config saved to /var/cache/conftool/dbconfig/20250604-122806-root.json
  • 12:27 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:27 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcontrol2010-dev.codfw.wmnet on all recursors
  • 12:27 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache cloudcontrol2010-dev.codfw.wmnet on all recursors
  • 12:26 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephosd2010-dev.codfw.wmnet on all recursors
  • 12:26 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache cloudcephosd2010-dev.codfw.wmnet on all recursors
  • 12:25 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:25 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:25 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for cloudcontrol2010-dev which had been added on wrong vlan - cmooney@cumin1002"
  • 12:25 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for cloudcontrol2010-dev which had been added on wrong vlan - cmooney@cumin1002"
  • 12:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T395241)', diff saved to https://phabricator.wikimedia.org/P77047 and previous config saved to /var/cache/conftool/dbconfig/20250604-122436-fceratto.json
  • 12:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 12:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T395241)', diff saved to https://phabricator.wikimedia.org/P77046 and previous config saved to /var/cache/conftool/dbconfig/20250604-122411-fceratto.json
  • 12:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2217 T395989', diff saved to https://phabricator.wikimedia.org/P77045 and previous config saved to /var/cache/conftool/dbconfig/20250604-122303-marostegui.json
  • 12:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 12:21 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 12:21 reedy@deploy1003: reedy: Continuing with sync
  • 12:21 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • {{safesubst:SAL entry|1=12:20 reedy@deploy1003: reedy: Backport for GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring (T388531), captcha.py: Expand variables and user in filenames (T395810), captcha.py: Check if output dir exists, and attempt to create it (else error) (T395804), [[gerrit:1153595|captcha.py: Bail out if no words were read from wordlist (T3}}
  • 12:18 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • {{safesubst:SAL entry|1=12:18 reedy@deploy1003: Started scap sync-world: Backport for GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring (T388531), captcha.py: Expand variables and user in filenames (T395810), captcha.py: Check if output dir exists, and attempt to create it (else error) (T395804), [[gerrit:1153595|captcha.py: Bail out if no words were read}}
  • 12:16 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
  • 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77044 and previous config saved to /var/cache/conftool/dbconfig/20250604-121442-root.json
  • 12:12 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
  • 12:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P77043 and previous config saved to /var/cache/conftool/dbconfig/20250604-120904-fceratto.json
  • 11:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77041 and previous config saved to /var/cache/conftool/dbconfig/20250604-115936-root.json
  • 11:58 samtar@deploy1003: Finished scap sync-world: Backport for IS/IS-labs: Enable TemplateDiscovery flags for mediawikiwiki (T377975) (duration: 12m 28s)
  • 11:55 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts krb1001.eqiad.wmnet
  • 11:55 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:55 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: krb1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 11:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P77040 and previous config saved to /var/cache/conftool/dbconfig/20250604-115357-fceratto.json
  • 11:51 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: krb1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 11:51 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti7001.magru.wmnet with OS bookworm
  • 11:51 samtar@deploy1003: samtar: Continuing with sync
  • 11:47 samtar@deploy1003: samtar: Backport for IS/IS-labs: Enable TemplateDiscovery flags for mediawikiwiki (T377975) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:45 samtar@deploy1003: Started scap sync-world: Backport for IS/IS-labs: Enable TemplateDiscovery flags for mediawikiwiki (T377975)
  • 11:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77038 and previous config saved to /var/cache/conftool/dbconfig/20250604-114430-root.json
  • 11:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T395241)', diff saved to https://phabricator.wikimedia.org/P77037 and previous config saved to /var/cache/conftool/dbconfig/20250604-113849-fceratto.json
  • 11:38 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 11:35 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:35 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:35 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:34 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:34 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
  • 11:34 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 11:33 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:32 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:32 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts krb1001.eqiad.wmnet
  • 11:31 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:31 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T395241)', diff saved to https://phabricator.wikimedia.org/P77036 and previous config saved to /var/cache/conftool/dbconfig/20250604-113030-fceratto.json
  • 11:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 11:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T395241)', diff saved to https://phabricator.wikimedia.org/P77035 and previous config saved to /var/cache/conftool/dbconfig/20250604-113005-fceratto.json
  • 11:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77034 and previous config saved to /var/cache/conftool/dbconfig/20250604-112923-root.json
  • 11:27 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 11:25 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:25 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:22 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:22 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:20 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
  • 11:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P77033 and previous config saved to /var/cache/conftool/dbconfig/20250604-111457-fceratto.json
  • 11:14 claime: Deployed k8s-controller-sidecars version 1.0.2-3
  • 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77032 and previous config saved to /var/cache/conftool/dbconfig/20250604-111418-root.json
  • 11:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2193 T395989', diff saved to https://phabricator.wikimedia.org/P77031 and previous config saved to /var/cache/conftool/dbconfig/20250604-110955-marostegui.json
  • 11:06 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:06 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:06 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:05 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:05 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:04 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:04 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:03 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P77030 and previous config saved to /var/cache/conftool/dbconfig/20250604-105950-fceratto.json
  • 10:52 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7001.magru.wmnet to cluster magru03 and group B
  • 10:51 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru03 and group B
  • 10:45 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:44 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T395241)', diff saved to https://phabricator.wikimedia.org/P77029 and previous config saved to /var/cache/conftool/dbconfig/20250604-104443-fceratto.json
  • 10:38 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7001.magru.wmnet to cluster magru03 and group B
  • 10:37 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru03 and group B
  • 10:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T395241)', diff saved to https://phabricator.wikimedia.org/P77028 and previous config saved to /var/cache/conftool/dbconfig/20250604-103629-fceratto.json
  • 10:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 10:36 moritzm: failover ganeti master in ulsfo to ganeti4005
  • 10:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T395241)', diff saved to https://phabricator.wikimedia.org/P77027 and previous config saved to /var/cache/conftool/dbconfig/20250604-103604-fceratto.json
  • 10:35 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
  • 10:35 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
  • 10:35 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77026 and previous config saved to /var/cache/conftool/dbconfig/20250604-103233-root.json
  • 10:29 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 10:25 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
  • 10:25 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77025 and previous config saved to /var/cache/conftool/dbconfig/20250604-102351-root.json
  • 10:23 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
  • 10:23 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
  • 10:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P77024 and previous config saved to /var/cache/conftool/dbconfig/20250604-102056-fceratto.json
  • 10:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77023 and previous config saved to /var/cache/conftool/dbconfig/20250604-101728-root.json
  • 10:17 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
  • 10:09 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77022 and previous config saved to /var/cache/conftool/dbconfig/20250604-100846-root.json
  • 10:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P77021 and previous config saved to /var/cache/conftool/dbconfig/20250604-100547-fceratto.json
  • 10:05 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: switching to katran
  • 10:04 vgutierrez: upload liberica 0.15 to bookwork-wikimedia (apt.wm.o) - T395228
  • 10:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77020 and previous config saved to /var/cache/conftool/dbconfig/20250604-100222-root.json
  • 10:00 vgutierrez: depool lvs1013 before switching to katran - T395228
  • 09:59 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:59 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:55 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 09:55 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77018 and previous config saved to /var/cache/conftool/dbconfig/20250604-095340-root.json
  • 09:52 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:52 akosiaris: re-deploy changeprop-jobqueue to set the Host HTTP header for submission of all jobs. T395451
  • 09:52 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T395241)', diff saved to https://phabricator.wikimedia.org/P77017 and previous config saved to /var/cache/conftool/dbconfig/20250604-095041-fceratto.json
  • 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77016 and previous config saved to /var/cache/conftool/dbconfig/20250604-094715-root.json
  • 09:46 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 09:46 akosiaris: T395451 deploy mw-jobrunner hot patch for VirtualHost selection, testing out that the single version change will work this time around.
  • 09:46 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 09:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T395241)', diff saved to https://phabricator.wikimedia.org/P77015 and previous config saved to /var/cache/conftool/dbconfig/20250604-094217-fceratto.json
  • 09:42 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
  • 09:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:42 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
  • 09:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T395241)', diff saved to https://phabricator.wikimedia.org/P77014 and previous config saved to /var/cache/conftool/dbconfig/20250604-094152-fceratto.json
  • 09:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7001.magru.wmnet with OS bookworm
  • 09:38 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77013 and previous config saved to /var/cache/conftool/dbconfig/20250604-093835-root.json
  • 09:37 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:37 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:36 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
  • 09:33 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77011 and previous config saved to /var/cache/conftool/dbconfig/20250604-093251-root.json
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77010 and previous config saved to /var/cache/conftool/dbconfig/20250604-093209-root.json
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc1 T395983', diff saved to https://phabricator.wikimedia.org/P77009 and previous config saved to /var/cache/conftool/dbconfig/20250604-092819-marostegui.json
  • 09:27 marostegui: Move datadir on pc1011 dbmaint pc1 eqiad T395983
  • 09:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P77008 and previous config saved to /var/cache/conftool/dbconfig/20250604-092645-fceratto.json
  • 09:24 taavi@cumin1002: conftool action : set/weight=100:pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=x3
  • 09:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77007 and previous config saved to /var/cache/conftool/dbconfig/20250604-092328-root.json
  • 09:20 marostegui: Move datadir on pc2011 dbmaint pc1 codfw T395983
  • 09:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2011.codfw.wmnet,pc1011.eqiad.wmnet with reason: Maintenance
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc1 T395983', diff saved to https://phabricator.wikimedia.org/P77006 and previous config saved to /var/cache/conftool/dbconfig/20250604-091921-marostegui.json
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77005 and previous config saved to /var/cache/conftool/dbconfig/20250604-091745-root.json
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77004 and previous config saved to /var/cache/conftool/dbconfig/20250604-091704-root.json
  • 09:16 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:16 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
  • 09:16 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:15 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:15 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:15 akosiaris: T395451 rollback the host header addition, this is erroring out, returning 404s.
  • 09:14 akosiaris: T395451 rollback the host header addition, this is erroring out, returning 3xx.
  • 09:14 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:14 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 09:14 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:12 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
  • 09:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P77003 and previous config saved to /var/cache/conftool/dbconfig/20250604-091138-fceratto.json
  • 09:11 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 09:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2044.codfw.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2044', diff saved to https://phabricator.wikimedia.org/P77002 and previous config saved to /var/cache/conftool/dbconfig/20250604-091041-marostegui.json
  • 09:10 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 09:10 moritzm: installing qemu bugfix updates from Bookworm point release
  • 09:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f1-codfw
  • 09:08 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77001 and previous config saved to /var/cache/conftool/dbconfig/20250604-090823-root.json
  • 09:06 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f1-codfw
  • 09:05 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 09:03 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:02 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77000 and previous config saved to /var/cache/conftool/dbconfig/20250604-090240-root.json
  • 09:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2041', diff saved to https://phabricator.wikimedia.org/P76999 and previous config saved to /var/cache/conftool/dbconfig/20250604-090226-marostegui.json
  • 09:02 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2041.codfw.wmnet with reason: Maintenance
  • 09:01 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:57 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 08:57 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 08:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T395241)', diff saved to https://phabricator.wikimedia.org/P76998 and previous config saved to /var/cache/conftool/dbconfig/20250604-085630-fceratto.json
  • 08:53 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 08:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 08:51 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti7001.magru.wmnet with OS bookworm
  • 08:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T395241)', diff saved to https://phabricator.wikimedia.org/P76997 and previous config saved to /var/cache/conftool/dbconfig/20250604-084819-fceratto.json
  • 08:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 08:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76996 and previous config saved to /var/cache/conftool/dbconfig/20250604-084735-root.json
  • 08:42 akosiaris: deploy changeprop-jobqueue to set the Host HTTP header for submission of all jobs. T395451
  • 08:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 08:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76995 and previous config saved to /var/cache/conftool/dbconfig/20250604-084231-fceratto.json
  • 08:42 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 08:41 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:38 moritzm: removing ganeti7001 from magru01 cluster T394263
  • 08:38 marostegui: Change s6 eqiad dbmaint to SBR T383795
  • 08:38 akosiaris: revoke and clean helm-charts.discovery.wmnet old cergen cert from puppetmaster1001
  • 08:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76994 and previous config saved to /var/cache/conftool/dbconfig/20250604-083229-root.json
  • 08:28 marostegui: Change s6 codfw dbmaint to SBR T383795
  • 08:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P76993 and previous config saved to /var/cache/conftool/dbconfig/20250604-082725-fceratto.json
  • 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76992 and previous config saved to /var/cache/conftool/dbconfig/20250604-081725-root.json
  • 08:14 moritzm: removing atlas7001 from magru01 cluster T394263
  • 08:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P76991 and previous config saved to /var/cache/conftool/dbconfig/20250604-081219-fceratto.json
  • 08:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2224 T395989', diff saved to https://phabricator.wikimedia.org/P76990 and previous config saved to /var/cache/conftool/dbconfig/20250604-081058-marostegui.json
  • 08:05 moritzm: installing gcc-12 bugfix updates from Bookworm point releases (includes various run time libraries)
  • 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76989 and previous config saved to /var/cache/conftool/dbconfig/20250604-080546-root.json
  • 08:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2002.wikimedia.org
  • 08:02 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host apt2002.wikimedia.org
  • 07:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76988 and previous config saved to /var/cache/conftool/dbconfig/20250604-075711-fceratto.json
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76987 and previous config saved to /var/cache/conftool/dbconfig/20250604-075041-root.json
  • 07:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76986 and previous config saved to /var/cache/conftool/dbconfig/20250604-074850-fceratto.json
  • 07:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76985 and previous config saved to /var/cache/conftool/dbconfig/20250604-073921-root.json
  • 07:37 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to plain
  • 07:37 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to plain
  • 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76984 and previous config saved to /var/cache/conftool/dbconfig/20250604-073535-root.json
  • 07:34 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to plain
  • 07:31 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to plain
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76983 and previous config saved to /var/cache/conftool/dbconfig/20250604-072416-root.json
  • 07:23 Emperor: restart swift-object-replicator ms-be2066
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76982 and previous config saved to /var/cache/conftool/dbconfig/20250604-072030-root.json
  • 07:19 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to plain
  • 07:18 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to plain
  • 07:16 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to plain
  • 07:13 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to plain
  • 07:09 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to plain
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76981 and previous config saved to /var/cache/conftool/dbconfig/20250604-070910-root.json
  • 07:08 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to plain
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76980 and previous config saved to /var/cache/conftool/dbconfig/20250604-070525-root.json
  • 07:01 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet
  • 06:57 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76979 and previous config saved to /var/cache/conftool/dbconfig/20250604-065405-root.json
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76978 and previous config saved to /var/cache/conftool/dbconfig/20250604-065020-root.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76977 and previous config saved to /var/cache/conftool/dbconfig/20250604-063900-root.json
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76976 and previous config saved to /var/cache/conftool/dbconfig/20250604-063515-root.json
  • 06:33 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 06:33 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 06:31 marostegui@deploy1003: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes on es7" (duration: 09m 52s)
  • 06:24 marostegui@deploy1003: marostegui: Continuing with sync
  • 06:24 marostegui@deploy1003: marostegui: Backport for Revert "db-production.php: Disable writes on es7" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76974 and previous config saved to /var/cache/conftool/dbconfig/20250604-062355-root.json
  • 06:21 marostegui@deploy1003: Started scap sync-world: Backport for Revert "db-production.php: Disable writes on es7"
  • 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76973 and previous config saved to /var/cache/conftool/dbconfig/20250604-062010-root.json
  • 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1035 T395982', diff saved to https://phabricator.wikimedia.org/P76972 and previous config saved to /var/cache/conftool/dbconfig/20250604-060413-marostegui.json
  • 06:03 marostegui@dns1006: END - running authdns-update
  • 06:03 marostegui@dns1006: START - running authdns-update
  • 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1039 to es7 primary T395982', diff saved to https://phabricator.wikimedia.org/P76971 and previous config saved to /var/cache/conftool/dbconfig/20250604-060246-marostegui.json
  • 06:02 marostegui: Starting es7 eqiad failover from es1035 to es1039 - T395982
  • 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1039 with weight 0 T395982', diff saved to https://phabricator.wikimedia.org/P76970 and previous config saved to /var/cache/conftool/dbconfig/20250604-055744-marostegui.json
  • 05:56 marostegui@deploy1003: Finished scap sync-world: Backport for db-production.php: Disable writes on es7 (T395982) (duration: 13m 00s)
  • 05:49 marostegui@deploy1003: marostegui: Continuing with sync
  • 05:45 marostegui@deploy1003: marostegui: Backport for db-production.php: Disable writes on es7 (T395982) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 05:43 marostegui@deploy1003: Started scap sync-world: Backport for db-production.php: Disable writes on es7 (T395982)
  • 05:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T395982
  • 00:38 eileen: civicrm upgraded from 8eb67a94 to 22171c0b

2025-06-03

  • 22:42 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:39 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 22:10 eileen: civicrm upgraded from 3b59e784 to 8eb67a94
  • 22:09 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 21:59 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 21:53 tzatziki: removing 4 files for legal compliance
  • 21:48 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 21:41 tzatziki: removing 2 files for legal compliance
  • 21:38 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 21:38 sbassett@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 21:28 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 21:21 mstyles@deploy1003: Finished scap sync-world: Backport for Revert^2 "OATHAuth: Mark checkuser and suppress as requiring 2FA" (T150898) (duration: 11m 31s)
  • 21:18 bvibber@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 21:14 mstyles@deploy1003: mstyles, sbassett: Continuing with sync
  • 21:11 mstyles@deploy1003: mstyles, sbassett: Backport for Revert^2 "OATHAuth: Mark checkuser and suppress as requiring 2FA" (T150898) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:09 mstyles@deploy1003: Started scap sync-world: Backport for Revert^2 "OATHAuth: Mark checkuser and suppress as requiring 2FA" (T150898)
  • 21:03 cjming@deploy1003: Finished scap sync-world: Backport for Use default preference if no client preference in auth request (T395957) (duration: 09m 49s)
  • 20:56 cjming@deploy1003: matmarex, cjming: Continuing with sync
  • 20:55 cjming@deploy1003: matmarex, cjming: Backport for Use default preference if no client preference in auth request (T395957) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:53 cjming@deploy1003: Started scap sync-world: Backport for Use default preference if no client preference in auth request (T395957)
  • 20:46 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of es2040.codfw.wmnet onto es2048.codfw.wmnet
  • 20:46 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) es2048 gradually with 4 steps - Pool es2048.codfw.wmnet in after cloning
  • 20:37 cscott@deploy1003: Finished scap sync-world: Backport for Use ::getContentId() and ::clearContentId() from the Parsoid extension API (duration: 12m 41s)
  • 20:31 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
  • 20:30 cscott@deploy1003: cscott: Continuing with sync
  • 20:27 cscott@deploy1003: cscott: Backport for Use ::getContentId() and ::clearContentId() from the Parsoid extension API synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:25 cscott@deploy1003: Started scap sync-world: Backport for Use ::getContentId() and ::clearContentId() from the Parsoid extension API
  • 20:18 cjming@deploy1003: Finished scap sync-world: Backport for Deploy survey to en at twenty percent (T389393) (duration: 11m 18s)
  • 20:11 cjming@deploy1003: ksarabia, cjming: Continuing with sync
  • 20:09 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=scholarly_articles/ using stat1011.eqiad.wmnet)
  • 20:08 cjming@deploy1003: ksarabia, cjming: Backport for Deploy survey to en at twenty percent (T389393) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:06 cjming@deploy1003: Started scap sync-world: Backport for Deploy survey to en at twenty percent (T389393)
  • 19:37 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs1020.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf/date=20250526/wiki=wikidata/ using stat1008.eqiad.wmnet)
  • 19:37 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs1020.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf/date=20250526/wiki=wikidata/ using stat1008.eqiad.wmnet)
  • 19:36 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs1020.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf/date=20250526/wiki=wikidata/ using stat1010.eqiad.wmnet)
  • 19:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs1020.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf/date=20250526/wiki=wikidata/ using stat1010.eqiad.wmnet)
  • 19:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata on wdqs1020.eqiad.wmnet from DumpsSource.NFS (munging data to /srv/wdqs/munged, /srv/wdqs/lex-munged)
  • 19:35 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata on wdqs1020.eqiad.wmnet from DumpsSource.NFS (munging data to /srv/wdqs/munged, /srv/wdqs/lex-munged)
  • 19:22 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.4 refs T392174
  • 19:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=scholarly_articles/ using stat1011.eqiad.wmnet)
  • 19:08 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=scholarly_articles/ using stat1009.eqiad.wmnet)
  • 19:07 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=scholarly_articles/ using stat1009.eqiad.wmnet)
  • 19:05 ebernhardson@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: 0.3.157 (duration: 17m 57s)
  • 19:00 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
  • 18:47 ebernhardson@deploy1003: Started deploy [wdqs/wdqs@fea7794]: 0.3.157
  • 18:08 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet
  • 18:00 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
  • 17:50 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 17:50 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 17:45 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:45 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 17:42 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 17:41 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 17:39 bvibber@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 17:39 bvibber@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 17:39 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 17:39 bvibber@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 17:38 bvibber@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 17:38 bvibber@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 17:36 swfrench@deploy1003: Finished scap sync-world: Scap test run after revert - T389786 (duration: 02m 10s)
  • 17:35 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s4
  • 17:34 swfrench@deploy1003: Started scap sync-world: Scap test run after revert - T389786
  • 17:19 swfrench@deploy1003: Started scap sync-world: Scap run to test newly enabled dse-k8s-eqiad deployment - T388761 T389786
  • 17:17 bvibber@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 17:16 bvibber@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 17:16 bvibber@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 17:15 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 17:15 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 17:14 bvibber@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 17:14 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 17:13 bvibber@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 17:13 bvibber@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 17:12 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 17:06 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 17:06 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 17:01 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:59 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:57 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1019.eqiad.wmnet with reason: Debugging stuck queryies T390767
  • 16:42 bvibber@deploy1003: Finished scap sync-world: Backport for Fixes: Charts embedded in template rendering in Parsoid (T395462), Fixes: Charts embedded in template rendering in Parsoid (T395462) (duration: 09m 54s)
  • 16:38 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
  • 16:35 bvibber@deploy1003: bvibber: Continuing with sync
  • 16:35 bvibber@deploy1003: bvibber: Backport for Fixes: Charts embedded in template rendering in Parsoid (T395462), Fixes: Charts embedded in template rendering in Parsoid (T395462) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:34 sukhe@dns1004: END - running authdns-update
  • 16:33 sukhe: testing dummy authdns-update to ensure clean run after gc-authdns-git-repo.timer rnu
  • 16:33 sukhe@dns1004: START - running authdns-update
  • 16:32 bvibber@deploy1003: Started scap sync-world: Backport for Fixes: Charts embedded in template rendering in Parsoid (T395462), Fixes: Charts embedded in template rendering in Parsoid (T395462)
  • 16:23 jiji@deploy1003: Finished scap sync-world: T276994: We merged a number of noop patches, sparing deployers the scary diffs (duration: 02m 58s)
  • 16:22 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 16:21 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 16:20 jiji@deploy1003: Started scap sync-world: T276994: We merged a number of noop patches, sparing deployers the scary diffs
  • 16:12 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:12 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:06 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:04 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:04 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:57 jiji@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:57 jiji@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:55 jiji@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:54 jiji@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:51 jiji@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:51 jiji@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:50 jiji@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:18 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
  • 15:18 moritzm: installing gcc-12 bugfix updates from Bookworm point releases (includes various run time libraries)
  • 15:17 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
  • 15:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 15:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T395241)', diff saved to https://phabricator.wikimedia.org/P76966 and previous config saved to /var/cache/conftool/dbconfig/20250603-151552-fceratto.json
  • 15:10 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
  • 15:10 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
  • 15:10 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
  • 15:06 hashar: Restarted Gerrit due to issue with replication config | T395887
  • 15:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P76964 and previous config saved to /var/cache/conftool/dbconfig/20250603-150045-fceratto.json
  • 14:58 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
  • 14:58 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
  • 14:54 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
  • 14:54 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
  • 14:50 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
  • 14:46 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus7002.magru.wmnet with OS bookworm
  • 14:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P76963 and previous config saved to /var/cache/conftool/dbconfig/20250603-144538-fceratto.json
  • 14:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T395241)', diff saved to https://phabricator.wikimedia.org/P76962 and previous config saved to /var/cache/conftool/dbconfig/20250603-143031-fceratto.json
  • 14:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus7002.magru.wmnet with reason: host reimage
  • 14:26 jmm@cumin1003: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: elastic1063.eqiad.wmnet
  • 14:23 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus7002.magru.wmnet with reason: host reimage
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T395241)', diff saved to https://phabricator.wikimedia.org/P76961 and previous config saved to /var/cache/conftool/dbconfig/20250603-142314-fceratto.json
  • 14:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 14:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T395241)', diff saved to https://phabricator.wikimedia.org/P76960 and previous config saved to /var/cache/conftool/dbconfig/20250603-142248-fceratto.json
  • 14:19 jmm@cumin1003: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: elastic1103.eqiad.wmnet
  • 14:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P76959 and previous config saved to /var/cache/conftool/dbconfig/20250603-140740-fceratto.json
  • 14:01 Amir1: dropping term store tables from s8 (T351820)
  • 14:01 Amir1: dropping term store tables from s8 (T351802)
  • 13:57 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host prometheus7002.magru.wmnet with OS bookworm
  • 13:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P76957 and previous config saved to /var/cache/conftool/dbconfig/20250603-135233-fceratto.json
  • 13:49 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76956 and previous config saved to /var/cache/conftool/dbconfig/20250603-134935-root.json
  • 13:44 phuedx: Disabled the SDS 2.4.11 Synthetic A/A Test in xLab
  • 13:42 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 13:38 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 13:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T395241)', diff saved to https://phabricator.wikimedia.org/P76954 and previous config saved to /var/cache/conftool/dbconfig/20250603-133725-fceratto.json
  • 13:34 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76953 and previous config saved to /var/cache/conftool/dbconfig/20250603-133429-root.json
  • 13:33 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 13:32 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:32 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:31 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T395241)', diff saved to https://phabricator.wikimedia.org/P76952 and previous config saved to /var/cache/conftool/dbconfig/20250603-132802-fceratto.json
  • 13:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 13:27 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76951 and previous config saved to /var/cache/conftool/dbconfig/20250603-132735-fceratto.json
  • 13:26 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:22 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:21 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:20 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2050.codfw.wmnet
  • 13:20 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:20 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76950 and previous config saved to /var/cache/conftool/dbconfig/20250603-131959-root.json
  • 13:19 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76949 and previous config saved to /var/cache/conftool/dbconfig/20250603-131923-root.json
  • 13:18 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:17 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:16 moritzm: installing libavif security updates
  • 13:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 13:15 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:14 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2050.codfw.wmnet
  • 13:14 jgleeson: payments-wiki rolled back from def6c267 to 1a4ef678
  • 13:12 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 13:12 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P76948 and previous config saved to /var/cache/conftool/dbconfig/20250603-131228-fceratto.json
  • 13:11 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:11 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:08 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76946 and previous config saved to /var/cache/conftool/dbconfig/20250603-130453-root.json
  • 13:04 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76945 and previous config saved to /var/cache/conftool/dbconfig/20250603-130418-root.json
  • 13:04 marostegui: Shutdown clouddb1016:x3 T390954
  • 13:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Setting up x3 T390954
  • 12:58 moritzm: uploaded wmf-laptop 1.0.2 to apt.wikimedia.org
  • 12:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P76943 and previous config saved to /var/cache/conftool/dbconfig/20250603-125721-fceratto.json
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76942 and previous config saved to /var/cache/conftool/dbconfig/20250603-124948-root.json
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76941 and previous config saved to /var/cache/conftool/dbconfig/20250603-124913-root.json
  • 12:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76940 and previous config saved to /var/cache/conftool/dbconfig/20250603-124214-fceratto.json
  • 12:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2049.codfw.wmnet
  • 12:35 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2049.codfw.wmnet
  • 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76939 and previous config saved to /var/cache/conftool/dbconfig/20250603-123442-root.json
  • 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76938 and previous config saved to /var/cache/conftool/dbconfig/20250603-123407-root.json
  • 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76937 and previous config saved to /var/cache/conftool/dbconfig/20250603-123357-fceratto.json
  • 12:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T395241)', diff saved to https://phabricator.wikimedia.org/P76936 and previous config saved to /var/cache/conftool/dbconfig/20250603-123331-fceratto.json
  • 12:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 12:25 marostegui@cumin1002: START - Cookbook sre.mysql.pool es2048 gradually with 4 steps - Pool es2048.codfw.wmnet in after cloning
  • 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76935 and previous config saved to /var/cache/conftool/dbconfig/20250603-121937-root.json
  • 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76934 and previous config saved to /var/cache/conftool/dbconfig/20250603-121902-root.json
  • 12:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P76933 and previous config saved to /var/cache/conftool/dbconfig/20250603-121824-fceratto.json
  • 12:16 marostegui@deploy1003: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes on es7" (duration: 09m 47s)
  • 12:15 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 12:12 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 12:10 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 12:09 marostegui@deploy1003: marostegui: Continuing with sync
  • 12:09 marostegui@deploy1003: marostegui: Backport for Revert "db-production.php: Disable writes on es7" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:08 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 12:07 claime: Launching manual run of recount-categories cronjob - T395745
  • 12:06 marostegui@deploy1003: Started scap sync-world: Backport for Revert "db-production.php: Disable writes on es7"
  • 12:06 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 12:05 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 12:04 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76931 and previous config saved to /var/cache/conftool/dbconfig/20250603-120431-root.json
  • 12:03 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76930 and previous config saved to /var/cache/conftool/dbconfig/20250603-120356-root.json
  • 12:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P76929 and previous config saved to /var/cache/conftool/dbconfig/20250603-120316-fceratto.json
  • 12:02 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
  • 12:00 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 11:59 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 11:56 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 11:55 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 11:54 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 11:52 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2038 T395785', diff saved to https://phabricator.wikimedia.org/P76927 and previous config saved to /var/cache/conftool/dbconfig/20250603-115026-marostegui.json
  • 11:49 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2039 to es7 primary and set section read-write T395785', diff saved to https://phabricator.wikimedia.org/P76926 and previous config saved to /var/cache/conftool/dbconfig/20250603-114917-marostegui.json
  • 11:48 marostegui: Starting es7 codfw failover from es2038 to es2039 - T395785
  • 11:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T395241)', diff saved to https://phabricator.wikimedia.org/P76925 and previous config saved to /var/cache/conftool/dbconfig/20250603-114809-fceratto.json
  • 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'Set es2039 with weight 0 T395785', diff saved to https://phabricator.wikimedia.org/P76924 and previous config saved to /var/cache/conftool/dbconfig/20250603-114637-marostegui.json
  • 11:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T395785
  • 11:46 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 11:41 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2048.codfw.wmnet
  • 11:40 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 11:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T395241)', diff saved to https://phabricator.wikimedia.org/P76923 and previous config saved to /var/cache/conftool/dbconfig/20250603-113952-fceratto.json
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1039 T395647', diff saved to https://phabricator.wikimedia.org/P76922 and previous config saved to /var/cache/conftool/dbconfig/20250603-113946-marostegui.json
  • 11:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 11:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T395241)', diff saved to https://phabricator.wikimedia.org/P76921 and previous config saved to /var/cache/conftool/dbconfig/20250603-113924-fceratto.json
  • 11:39 marostegui@deploy1003: Finished scap sync-world: Backport for db-production.php: Disable writes on es7 (T395647) (duration: 09m 56s)
  • 11:36 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2048.codfw.wmnet
  • 11:33 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 11:32 marostegui@deploy1003: marostegui: Continuing with sync
  • 11:31 marostegui@deploy1003: marostegui: Backport for db-production.php: Disable writes on es7 (T395647) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T395785
  • 11:29 marostegui@deploy1003: Started scap sync-world: Backport for db-production.php: Disable writes on es7 (T395647)
  • 11:25 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2040 gradually with 4 steps - Pool es2040.codfw.wmnet in after cloning
  • 11:22 taavi@cumin1002: conftool action : set/weight=100:pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=x3
  • 11:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T395785
  • 11:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P76919 and previous config saved to /var/cache/conftool/dbconfig/20250603-111822-fceratto.json
  • 11:09 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 11:03 jgleeson: payments-wiki upgraded from 1a4ef678 to def6c267
  • 11:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P76917 and previous config saved to /var/cache/conftool/dbconfig/20250603-110315-fceratto.json
  • 10:57 phuedx: Enabling the SDS 2.4.11 Synthetic A/A Test in xLab
  • 10:57 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2047.codfw.wmnet
  • 10:52 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2047.codfw.wmnet
  • 10:49 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2046.codfw.wmnet
  • 10:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T395241)', diff saved to https://phabricator.wikimedia.org/P76915 and previous config saved to /var/cache/conftool/dbconfig/20250603-104809-fceratto.json
  • 10:44 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 10:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2046.codfw.wmnet
  • 10:43 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 10:43 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2045.codfw.wmnet
  • 10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T395241)', diff saved to https://phabricator.wikimedia.org/P76914 and previous config saved to /var/cache/conftool/dbconfig/20250603-104056-fceratto.json
  • 10:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T395241)', diff saved to https://phabricator.wikimedia.org/P76913 and previous config saved to /var/cache/conftool/dbconfig/20250603-104030-fceratto.json
  • 10:40 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2045.codfw.wmnet
  • 10:38 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 11 hosts with reason: silence alerts due to down BGP groups on cr2-codfw while PIC is reconfigured
  • 10:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P76912 and previous config saved to /var/cache/conftool/dbconfig/20250603-102523-fceratto.json
  • 10:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Fix the repool dbctl commit', diff saved to https://phabricator.wikimedia.org/P76911 and previous config saved to /var/cache/conftool/dbconfig/20250603-102517-ladsgroup.json
  • 10:18 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 10:13 topranks: drain cr2-codfw traffic to enable PIC port bw rebalence on slot 0 T387504
  • 09:59 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus7002.magru.wmnet with OS bookworm
  • 09:43 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1002.eqiad.wmnet with OS bookworm
  • 09:25 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage
  • 09:22 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage
  • 09:22 elukey: puppet cert destroy {mobileapps,proton,recommendation-api}.discovery.wmnet on puppetmaster1001 - old certs not used anymore
  • 09:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:18 marostegui@cumin1002: START - Cookbook sre.mysql.pool es2040 gradually with 4 steps - Pool es2040.codfw.wmnet in after cloning
  • 09:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P76909 and previous config saved to /var/cache/conftool/dbconfig/20250603-091521-fceratto.json
  • 09:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:06 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2008.codfw.wmnet with OS bullseye
  • 09:06 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
  • 09:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T395241)', diff saved to https://phabricator.wikimedia.org/P76907 and previous config saved to /var/cache/conftool/dbconfig/20250603-090013-fceratto.json
  • 08:59 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
  • 08:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T395241)', diff saved to https://phabricator.wikimedia.org/P76906 and previous config saved to /var/cache/conftool/dbconfig/20250603-085148-fceratto.json
  • 08:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 08:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T395241)', diff saved to https://phabricator.wikimedia.org/P76905 and previous config saved to /var/cache/conftool/dbconfig/20250603-085121-fceratto.json
  • 08:45 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:43 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:42 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2008.codfw.wmnet with reason: host reimage
  • 08:41 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:40 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:38 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2008.codfw.wmnet with reason: host reimage
  • 08:37 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P76904 and previous config saved to /var/cache/conftool/dbconfig/20250603-083614-fceratto.json
  • 08:36 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:34 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host prometheus7002.magru.wmnet with OS bookworm
  • 08:30 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner1002.eqiad.wmnet with OS bookworm
  • 08:22 moritzm: rearm keyholder on cumin1003 following reboot
  • 08:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P76903 and previous config saved to /var/cache/conftool/dbconfig/20250603-082107-fceratto.json
  • 08:16 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be2008.codfw.wmnet with OS bullseye
  • 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1003.eqiad.wmnet
  • 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1003.eqiad.wmnet
  • 08:10 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2009.codfw.wmnet with OS bullseye
  • 08:10 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
  • 08:06 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
  • 08:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T395241)', diff saved to https://phabricator.wikimedia.org/P76901 and previous config saved to /var/cache/conftool/dbconfig/20250603-080600-fceratto.json
  • 08:04 phuedx: Disabling the SDS 2.4.11 Synthetic A/A/ Test in xLab
  • 07:57 phuedx: Enabling the SDS 2.4.11 Synthetic A/A Test in xLab
  • 07:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T395241)', diff saved to https://phabricator.wikimedia.org/P76900 and previous config saved to /var/cache/conftool/dbconfig/20250603-075638-fceratto.json
  • 07:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 07:56 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus7002.magru.wmnet
  • 07:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T395241)', diff saved to https://phabricator.wikimedia.org/P76899 and previous config saved to /var/cache/conftool/dbconfig/20250603-075622-fceratto.json
  • 07:56 jmm@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 07:49 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2009.codfw.wmnet with reason: host reimage
  • 07:49 jmm@cumin1003: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master
  • 07:46 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 07:46 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host prometheus7002.magru.wmnet
  • 07:46 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2009.codfw.wmnet with reason: host reimage
  • 07:44 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors
  • 07:44 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors
  • 07:43 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast7002.wikimedia.org
  • 07:43 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast7002.wikimedia.org with OS bookworm
  • 07:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P76898 and previous config saved to /var/cache/conftool/dbconfig/20250603-074113-fceratto.json
  • 07:38 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors
  • 07:38 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors
  • 07:37 jmm@cumin1003: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master
  • 07:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast7002.wikimedia.org with reason: host reimage
  • 07:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P76897 and previous config saved to /var/cache/conftool/dbconfig/20250603-072604-fceratto.json
  • 07:25 tchanders@deploy1003: Finished scap sync-world: Backport for Assign IP auto-reveal rights to certain groups (T386492) (duration: 10m 39s)
  • 07:23 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be2009.codfw.wmnet with OS bullseye
  • 07:23 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on bast7002.wikimedia.org with reason: host reimage
  • 07:18 tchanders@deploy1003: tchanders: Continuing with sync
  • 07:16 tchanders@deploy1003: tchanders: Backport for Assign IP auto-reveal rights to certain groups (T386492) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:14 tchanders@deploy1003: Started scap sync-world: Backport for Assign IP auto-reveal rights to certain groups (T386492)
  • 07:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T395241)', diff saved to https://phabricator.wikimedia.org/P76896 and previous config saved to /var/cache/conftool/dbconfig/20250603-071057-fceratto.json
  • 07:06 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:06 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:02 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:02 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76895 and previous config saved to /var/cache/conftool/dbconfig/20250603-070155-root.json
  • 07:00 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T395241)', diff saved to https://phabricator.wikimedia.org/P76894 and previous config saved to /var/cache/conftool/dbconfig/20250603-070036-fceratto.json
  • 07:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 07:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T395241)', diff saved to https://phabricator.wikimedia.org/P76893 and previous config saved to /var/cache/conftool/dbconfig/20250603-070021-fceratto.json
  • 06:58 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host bast7002.wikimedia.org with OS bookworm
  • 06:54 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast7002.wikimedia.org - jmm@cumin1003"
  • 06:54 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast7002.wikimedia.org - jmm@cumin1003"
  • 06:54 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast7002.wikimedia.org on all recursors
  • 06:54 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache bast7002.wikimedia.org on all recursors
  • 06:54 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:54 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast7002.wikimedia.org - jmm@cumin1003"
  • 06:53 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast7002.wikimedia.org - jmm@cumin1003"
  • 06:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2009.codfw.wmnet with OS bullseye
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76892 and previous config saved to /var/cache/conftool/dbconfig/20250603-064649-root.json
  • 06:45 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 06:45 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host bast7002.wikimedia.org
  • 06:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P76891 and previous config saved to /var/cache/conftool/dbconfig/20250603-064513-fceratto.json
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76890 and previous config saved to /var/cache/conftool/dbconfig/20250603-064147-root.json
  • 06:37 marostegui: Decrease buffer size on clouddb1016:s8 T390954
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76889 and previous config saved to /var/cache/conftool/dbconfig/20250603-063144-root.json
  • 06:30 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Setting up x3 T390954
  • 06:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P76887 and previous config saved to /var/cache/conftool/dbconfig/20250603-063004-fceratto.json
  • 06:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Setting up x3 T390954
  • 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76886 and previous config saved to /var/cache/conftool/dbconfig/20250603-062641-root.json
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76885 and previous config saved to /var/cache/conftool/dbconfig/20250603-061638-root.json
  • 06:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T395241)', diff saved to https://phabricator.wikimedia.org/P76884 and previous config saved to /var/cache/conftool/dbconfig/20250603-061457-fceratto.json
  • 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76883 and previous config saved to /var/cache/conftool/dbconfig/20250603-061134-root.json
  • 06:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T395241)', diff saved to https://phabricator.wikimedia.org/P76882 and previous config saved to /var/cache/conftool/dbconfig/20250603-060719-fceratto.json
  • 06:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76881 and previous config saved to /var/cache/conftool/dbconfig/20250603-060132-root.json
  • 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76880 and previous config saved to /var/cache/conftool/dbconfig/20250603-055628-root.json
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76879 and previous config saved to /var/cache/conftool/dbconfig/20250603-054626-root.json
  • 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76878 and previous config saved to /var/cache/conftool/dbconfig/20250603-054123-root.json
  • 05:39 marostegui@deploy1003: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes on es6" (duration: 09m 52s)
  • 05:32 marostegui@deploy1003: marostegui: Continuing with sync
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Give some weight to es1038', diff saved to https://phabricator.wikimedia.org/P76877 and previous config saved to /var/cache/conftool/dbconfig/20250603-053151-marostegui.json
  • 05:31 marostegui@deploy1003: marostegui: Backport for Revert "db-production.php: Disable writes on es6" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 05:29 marostegui@deploy1003: Started scap sync-world: Backport for Revert "db-production.php: Disable writes on es6"
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1037 T395867', diff saved to https://phabricator.wikimedia.org/P76876 and previous config saved to /var/cache/conftool/dbconfig/20250603-052719-marostegui.json
  • 05:27 marostegui@dns1006: END - running authdns-update
  • 05:26 marostegui@dns1006: START - running authdns-update
  • 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1038 to es6 primary T395867', diff saved to https://phabricator.wikimedia.org/P76875 and previous config saved to /var/cache/conftool/dbconfig/20250603-052614-marostegui.json
  • 05:25 marostegui: Starting es6 eqiad failover from es1037 to es1038 - T395867
  • 05:23 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1038 with weight 0 T395867', diff saved to https://phabricator.wikimedia.org/P76874 and previous config saved to /var/cache/conftool/dbconfig/20250603-052353-marostegui.json
  • 05:22 marostegui@deploy1003: Finished scap sync-world: Backport for db-production.php: Disable writes on es6 (T395867) (duration: 13m 39s)
  • 05:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es6 T395867
  • 05:14 marostegui@deploy1003: marostegui: Continuing with sync
  • 05:13 marostegui@deploy1003: marostegui: Backport for db-production.php: Disable writes on es6 (T395867) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 05:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es6 T395867
  • 05:09 marostegui@deploy1003: Started scap sync-world: Backport for db-production.php: Disable writes on es6 (T395867)
  • 04:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2037.codfw.wmnet with reason: Primary switchover es6 T395420
  • 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2037 T395420', diff saved to https://phabricator.wikimedia.org/P76873 and previous config saved to /var/cache/conftool/dbconfig/20250603-045251-marostegui.json
  • 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2035 to es6 primary and set section read-write T395420', diff saved to https://phabricator.wikimedia.org/P76872 and previous config saved to /var/cache/conftool/dbconfig/20250603-045202-root.json
  • 04:51 marostegui: Starting es6 codfw failover from es2037 to es2035 - T395420
  • 04:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es6 T395420
  • 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Set es2035 with weight 0 T395420', diff saved to https://phabricator.wikimedia.org/P76871 and previous config saved to /var/cache/conftool/dbconfig/20250603-044855-root.json
  • 04:45 marostegui@cumin1002: dbctl commit (dc=all): 'Add es2048 to dbctl depooled T395771+', diff saved to https://phabricator.wikimedia.org/P76870 and previous config saved to /var/cache/conftool/dbconfig/20250603-044550-marostegui.json
  • 04:40 marostegui@cumin1002: START - Cookbook sre.mysql.clone of es2040.codfw.wmnet onto es2048.codfw.wmnet
  • 04:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2040 T395771', diff saved to https://phabricator.wikimedia.org/P76869 and previous config saved to /var/cache/conftool/dbconfig/20250603-043151-marostegui.json
  • 04:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2040.codfw.wmnet with reason: Maintenance
  • 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.1 (duration: 01m 39s)
  • 03:48 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.4 refs T392174 (duration: 45m 55s)
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.4 refs T392174
  • 00:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 00:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage

2025-06-02

  • 23:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2008.codfw.wmnet with reason: host reimage
  • 23:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2008.codfw.wmnet with reason: host reimage
  • 23:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be2008.codfw.wmnet with OS bullseye
  • 23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 23:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 23:24 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:22 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1049.eqiad.wmnet with OS bullseye
  • 23:10 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:09 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 22:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 22:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 22:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 22:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 22:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 22:16 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 22:08 maryum: scap sync-world finished to deploy several security bugs and PrivateSettings.php changes
  • 22:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 21:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1049.eqiad.wmnet with OS bullseye
  • 21:38 tgr@deploy1003: Unlocked for deployment [MediaWiki]: T395758 (duration: 22m 32s)
  • 21:38 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 21:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 21:22 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch205*,cirrussearch2060* for T395855 - bking@cumin2002
  • 21:22 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch205*,cirrussearch2060* for T395855 - bking@cumin2002
  • 21:16 bking@cumin2002: conftool action : set/pooled=no; selector: name=cirrussearch2055.codfw.wmnet|cirrussearch2056.codfw.wmnet|cirrussearch2057.codfw.wmnet|cirrussearch2058.codfw.wmnet|cirrussearch2059.codfw.wmnet|cirrussearch2060.codfw.wmnet|cirrussearch2091.codfw.wmnet
  • 21:16 tgr@deploy1003: Locking from deployment [MediaWiki]: T395758
  • 21:13 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch.*.codfw.wmnet
  • 21:11 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 21:06 cjming@deploy1003: Finished scap sync-world: Backport for Simple summaries survey for English (T389393) (duration: 11m 41s)
  • 21:05 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1049.eqiad.wmnet with OS bullseye
  • 21:04 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1049.eqiad.wmnet with OS bullseye
  • 20:59 cjming@deploy1003: cjming, ksarabia: Continuing with sync
  • 20:59 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 20:59 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
  • 20:56 cjming@deploy1003: cjming, ksarabia: Backport for Simple summaries survey for English (T389393) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 20:55 cjming@deploy1003: Started scap sync-world: Backport for Simple summaries survey for English (T389393)
  • 20:51 jsn@deploy1003: Finished scap sync-world: Backport for Undeploy first set of Patroller Tools surveys (T389401) (duration: 12m 55s)
  • 20:45 jsn@deploy1003: jsn: Continuing with sync
  • 20:41 jsn@deploy1003: jsn: Backport for Undeploy first set of Patroller Tools surveys (T389401) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:38 jsn@deploy1003: Started scap sync-world: Backport for Undeploy first set of Patroller Tools surveys (T389401)
  • 20:36 arlolra@deploy1003: Finished scap sync-world: Backport for Remove wgParserEnableLegacyHeadingDOM option (T371756) (duration: 10m 37s)
  • 20:29 arlolra@deploy1003: arlolra: Continuing with sync
  • 20:27 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1028
  • 20:27 arlolra@deploy1003: arlolra: Backport for Remove wgParserEnableLegacyHeadingDOM option (T371756) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:27 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mx-in2001.wikimedia.org with reason: T395240
  • 20:26 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mx-in1001.wikimedia.org with reason: T395240
  • 20:26 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1028
  • 20:25 arlolra@deploy1003: Started scap sync-world: Backport for Remove wgParserEnableLegacyHeadingDOM option (T371756)
  • 20:23 cjming@deploy1003: Finished scap sync-world: Backport for ext.xLab: Send limited copies of stream configs (T391988) (duration: 15m 51s)
  • 20:22 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mx-out1001.wikimedia.org with reason: T395240
  • 20:18 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
  • 20:16 cjming@deploy1003: cjming, phuedx: Continuing with sync
  • 20:16 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mx-out2001.wikimedia.org with reason: T395240
  • 20:10 cjming@deploy1003: cjming, phuedx: Backport for ext.xLab: Send limited copies of stream configs (T391988) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:07 cjming@deploy1003: Started scap sync-world: Backport for ext.xLab: Send limited copies of stream configs (T391988)
  • 19:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 19:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 19:36 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 19:32 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 19:20 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs3008.esams.wmnet} and A:liberica
  • 19:20 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs3008.esams.wmnet} and A:liberica
  • 19:20 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling P{lvs3008.esams.wmnet} and A:liberica
  • 19:20 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs3008.esams.wmnet} and A:liberica
  • 19:20 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs3008.esams.wmnet} and A:liberica
  • 19:19 sukhe@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs3008.esams.wmnet} and A:liberica
  • 19:15 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs3009.esams.wmnet} and A:liberica
  • 19:15 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs3009.esams.wmnet} and A:liberica
  • 19:14 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling P{lvs3009.esams.wmnet} and A:liberica
  • 19:14 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs3009.esams.wmnet} and A:liberica
  • 19:14 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs3009.esams.wmnet} and A:liberica
  • 19:14 sukhe@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs3009.esams.wmnet} and A:liberica
  • 19:08 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 19:06 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs3010.esams.wmnet} and A:liberica
  • 19:06 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs3010.esams.wmnet} and A:liberica
  • 19:05 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling P{lvs3010.esams.wmnet} and A:liberica
  • 19:05 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs3010.esams.wmnet} and A:liberica
  • 19:05 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs3010.esams.wmnet} and A:liberica
  • 19:05 sukhe@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs3010.esams.wmnet} and A:liberica
  • 18:46 jasmine@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wikikube-worker1028.eqiad.wmnet
  • 18:46 jasmine@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:44 jasmine@cumin1002: START - Cookbook sre.dns.netbox
  • 18:34 jasmine@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker1028.eqiad.wmnet
  • 18:33 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1104.eqiad.wmnet,service=(cdn|ats-be)
  • 18:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T395241)', diff saved to https://phabricator.wikimedia.org/P76866 and previous config saved to /var/cache/conftool/dbconfig/20250602-183230-fceratto.json
  • 18:24 brett: include libvmod-wmfuniq 0.2.0~deb11u1 in bullseye-wikimedia
  • 18:23 brett: include libvmod-wmfuniq 0.2.0~deb12u1 in bookworm-wikimedia
  • 18:21 ebernhardson@deploy1003: Finished deploy [airflow-dags/search@443d0ab]: bump glent to 0.3.6 (duration: 00m 29s)
  • 18:21 ebernhardson@deploy1003: Started deploy [airflow-dags/search@443d0ab]: bump glent to 0.3.6
  • 18:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P76865 and previous config saved to /var/cache/conftool/dbconfig/20250602-181722-fceratto.json
  • 18:10 jasmine@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wikikube-worker[1026-1028].eqiad.wmnet
  • 18:10 jasmine@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:10 jasmine@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[1026-1028].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1002"
  • 18:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 18:06 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 18:05 jasmine@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[1026-1028].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1002"
  • 18:02 jasmine@cumin1002: START - Cookbook sre.dns.netbox
  • 18:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P76864 and previous config saved to /var/cache/conftool/dbconfig/20250602-180216-fceratto.json
  • 17:50 jasmine@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[1026-1028].eqiad.wmnet
  • 17:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T395241)', diff saved to https://phabricator.wikimedia.org/P76863 and previous config saved to /var/cache/conftool/dbconfig/20250602-174708-fceratto.json
  • 17:38 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T395241)', diff saved to https://phabricator.wikimedia.org/P76862 and previous config saved to /var/cache/conftool/dbconfig/20250602-173850-fceratto.json
  • 17:38 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1254.eqiad.wmnet with reason: Maintenance
  • 17:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 17:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T395241)', diff saved to https://phabricator.wikimedia.org/P76861 and previous config saved to /var/cache/conftool/dbconfig/20250602-173316-fceratto.json
  • 17:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 17:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250602-171804-fceratto.json
  • 17:05 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1026-1028].eqiad.wmnet
  • 17:04 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1026-1028].eqiad.wmnet
  • 17:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P76860 and previous config saved to /var/cache/conftool/dbconfig/20250602-170256-fceratto.json
  • 16:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 16:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T395241)', diff saved to https://phabricator.wikimedia.org/P76859 and previous config saved to /var/cache/conftool/dbconfig/20250602-164748-fceratto.json
  • 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T395241)', diff saved to https://phabricator.wikimedia.org/P76857 and previous config saved to /var/cache/conftool/dbconfig/20250602-164030-fceratto.json
  • 16:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 16:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 16:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T395241)', diff saved to https://phabricator.wikimedia.org/P76856 and previous config saved to /var/cache/conftool/dbconfig/20250602-164003-fceratto.json
  • 16:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76855 and previous config saved to /var/cache/conftool/dbconfig/20250602-162957-root.json
  • 16:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P76854 and previous config saved to /var/cache/conftool/dbconfig/20250602-162455-fceratto.json
  • 16:22 sukhe: sudo cumin -b1 -s60 'A:cp and not P{cp7001*}' "depool cdn && sleep 10 && run-puppet-agent --enable 'merging CR 1091330' && systemctl restart trafficserver.service && sleep 10 && pool cdn"
  • 16:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76853 and previous config saved to /var/cache/conftool/dbconfig/20250602-161452-root.json
  • 16:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P76852 and previous config saved to /var/cache/conftool/dbconfig/20250602-160948-fceratto.json
  • 16:03 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet [reason: [end] testing CR 1091330]
  • 15:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76851 and previous config saved to /var/cache/conftool/dbconfig/20250602-155946-root.json
  • 15:55 sukhe: enable puppet and run agent on cp7001
  • 15:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T395241)', diff saved to https://phabricator.wikimedia.org/P76850 and previous config saved to /var/cache/conftool/dbconfig/20250602-155441-fceratto.json
  • 15:52 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet [reason: testing CR 1091330]
  • 15:50 sukhe: disable puppet on A:cp to merge CR: 1091330
  • 15:49 phuedx@deploy1003: Finished scap sync-world: Backport for Enable MetricsPlatform's experimentation feature (duration: 14m 23s)
  • 15:47 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T395241)', diff saved to https://phabricator.wikimedia.org/P76849 and previous config saved to /var/cache/conftool/dbconfig/20250602-154734-fceratto.json
  • 15:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 15:47 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T395241)', diff saved to https://phabricator.wikimedia.org/P76848 and previous config saved to /var/cache/conftool/dbconfig/20250602-154709-fceratto.json
  • 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2009.codfw.wmnet with reason: host reimage
  • 15:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76847 and previous config saved to /var/cache/conftool/dbconfig/20250602-154440-root.json
  • 15:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2009.codfw.wmnet with reason: host reimage
  • 15:42 phuedx@deploy1003: phuedx: Continuing with sync
  • 15:38 phuedx@deploy1003: phuedx: Backport for Enable MetricsPlatform's experimentation feature synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:35 phuedx@deploy1003: Started scap sync-world: Backport for Enable MetricsPlatform's experimentation feature
  • 15:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P76846 and previous config saved to /var/cache/conftool/dbconfig/20250602-153201-fceratto.json
  • 15:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76845 and previous config saved to /var/cache/conftool/dbconfig/20250602-152935-root.json
  • 15:27 joal@deploy1003: Finished deploy [airflow-dags/analytics@03db055]: Regular analytics weekly train (with pull...) [airflow-dags/analytics_test@03db0552] (duration: 00m 42s)
  • 15:26 joal@deploy1003: Started deploy [airflow-dags/analytics@03db055]: Regular analytics weekly train (with pull...) [airflow-dags/analytics_test@03db0552]
  • 15:21 thcipriani: jouncebot nowandnext
  • 15:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 15:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P76844 and previous config saved to /var/cache/conftool/dbconfig/20250602-151654-fceratto.json
  • 15:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2009.codfw.wmnet with OS bullseye
  • 15:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76843 and previous config saved to /var/cache/conftool/dbconfig/20250602-151429-root.json
  • 15:03 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@4ebb376]: Regular analytics weekly train [airflow-dags/analytics_test@4ebb376f] (duration: 00m 05s)
  • 15:03 joal@deploy1003: Started deploy [airflow-dags/analytics_test@4ebb376]: Regular analytics weekly train [airflow-dags/analytics_test@4ebb376f]
  • 15:03 joal@deploy1003: Finished deploy [airflow-dags/analytics@afad011]: Regular analytics weekly train [airflow-dags/main@afad011c] (duration: 00m 07s)
  • 15:03 joal@deploy1003: Started deploy [airflow-dags/analytics@afad011]: Regular analytics weekly train [airflow-dags/main@afad011c]
  • 15:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T395241)', diff saved to https://phabricator.wikimedia.org/P76842 and previous config saved to /var/cache/conftool/dbconfig/20250602-150146-fceratto.json
  • 15:00 phuedx: Disabled the SDS 2.4.11 Synthetic A/A Test in xLab
  • 14:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T395241)', diff saved to https://phabricator.wikimedia.org/P76841 and previous config saved to /var/cache/conftool/dbconfig/20250602-145443-fceratto.json
  • 14:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 14:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T395241)', diff saved to https://phabricator.wikimedia.org/P76840 and previous config saved to /var/cache/conftool/dbconfig/20250602-145418-fceratto.json
  • 14:54 joal@deploy1003: Finished deploy [analytics/refinery@b1aa837] (hadoop-test): Regular analytics weekly train test [analytics/refinery@b1aa837f] (duration: 09m 27s)
  • 14:44 joal@deploy1003: Started deploy [analytics/refinery@b1aa837] (hadoop-test): Regular analytics weekly train test [analytics/refinery@b1aa837f]
  • 14:44 joal@deploy1003: Finished deploy [analytics/refinery@b1aa837] (thin): Regular analytics weekly train THIN [analytics/refinery@b1aa837f] (duration: 01m 06s)
  • 14:43 joal@deploy1003: Started deploy [analytics/refinery@b1aa837] (thin): Regular analytics weekly train THIN [analytics/refinery@b1aa837f]
  • 14:42 joal@deploy1003: Finished deploy [analytics/refinery@b1aa837]: Regular analytics weekly train [analytics/refinery@b1aa837f] (duration: 03m 08s)
  • 14:39 joal@deploy1003: Started deploy [analytics/refinery@b1aa837]: Regular analytics weekly train [analytics/refinery@b1aa837f]
  • 14:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P76835 and previous config saved to /var/cache/conftool/dbconfig/20250602-143910-fceratto.json
  • 14:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[1154,1211].eqiad.wmnet with reason: Maintenance
  • 14:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic1067.eqiad.wmnet
  • 14:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:35 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: elastic1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 14:35 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: elastic1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 14:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P76833 and previous config saved to /var/cache/conftool/dbconfig/20250602-142403-fceratto.json
  • 14:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T395241)', diff saved to https://phabricator.wikimedia.org/P76832 and previous config saved to /var/cache/conftool/dbconfig/20250602-140854-fceratto.json
  • 14:04 phuedx: Enabling the SDS 2.4.11 Synthetic A/A Test in xLab
  • 13:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T395241)', diff saved to https://phabricator.wikimedia.org/P76831 and previous config saved to /var/cache/conftool/dbconfig/20250602-135945-fceratto.json
  • 13:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 13:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T395241)', diff saved to https://phabricator.wikimedia.org/P76830 and previous config saved to /var/cache/conftool/dbconfig/20250602-135920-fceratto.json
  • 13:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:49 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus7002.magru.wmnet
  • 13:49 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus7002.magru.wmnet with OS bookworm
  • 13:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P76829 and previous config saved to /var/cache/conftool/dbconfig/20250602-134413-fceratto.json
  • 13:43 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1049.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:38 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic1067.eqiad.wmnet
  • 13:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cirrussearch[1064-1066].eqiad.wmnet
  • 13:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch[1064-1066].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 13:37 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch[1064-1066].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 13:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P76828 and previous config saved to /var/cache/conftool/dbconfig/20250602-132906-fceratto.json
  • 13:24 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 13:22 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for core-Namespaces: Add Page, Author to default search ns in ruwikisource (T395632) (duration: 12m 00s)
  • 13:21 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1049.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:19 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 13:17 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for cloudcephosd1048-51 - jclark@cumin1002"
  • 13:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for cloudcephosd1048-51 - jclark@cumin1002"
  • 13:15 lucaswerkmeister-wmde@deploy1003: bunnypranav, lucaswerkmeister-wmde: Continuing with sync
  • 13:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T395241)', diff saved to https://phabricator.wikimedia.org/P76826 and previous config saved to /var/cache/conftool/dbconfig/20250602-131359-fceratto.json
  • 13:13 lucaswerkmeister-wmde@deploy1003: bunnypranav, lucaswerkmeister-wmde: Backport for core-Namespaces: Add Page, Author to default search ns in ruwikisource (T395632) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:13 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:10 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for core-Namespaces: Add Page, Author to default search ns in ruwikisource (T395632)
  • 13:09 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cirrussearch[1064-1066].eqiad.wmnet
  • 13:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T395241)', diff saved to https://phabricator.wikimedia.org/P76825 and previous config saved to /var/cache/conftool/dbconfig/20250602-130548-fceratto.json
  • 13:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 13:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T395241)', diff saved to https://phabricator.wikimedia.org/P76824 and previous config saved to /var/cache/conftool/dbconfig/20250602-130523-fceratto.json
  • 12:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P76823 and previous config saved to /var/cache/conftool/dbconfig/20250602-125016-fceratto.json
  • 12:37 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of es2036.codfw.wmnet onto es2047.codfw.wmnet
  • 12:37 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2047 gradually with 4 steps - Pool es2047.codfw.wmnet in after cloning
  • 12:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2003.wikimedia.org
  • 12:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P76821 and previous config saved to /var/cache/conftool/dbconfig/20250602-123508-fceratto.json
  • 12:33 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host irc2003.wikimedia.org
  • 12:26 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host prometheus7002.magru.wmnet with OS bookworm
  • 12:25 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus7002.magru.wmnet - jmm@cumin1003"
  • 12:25 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus7002.magru.wmnet - jmm@cumin1003"
  • 12:24 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus7002.magru.wmnet on all recursors
  • 12:24 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache prometheus7002.magru.wmnet on all recursors
  • 12:24 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:24 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus7002.magru.wmnet - jmm@cumin1003"
  • 12:22 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus7002.magru.wmnet - jmm@cumin1003"
  • 12:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T395241)', diff saved to https://phabricator.wikimedia.org/P76819 and previous config saved to /var/cache/conftool/dbconfig/20250602-122001-fceratto.json
  • 12:17 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 12:17 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host prometheus7002.magru.wmnet
  • 12:12 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh7003.wikimedia.org
  • 12:12 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh7003.wikimedia.org with OS bookworm
  • 12:11 claime: cgoubert@mwmaint1002:~$ sudo systemctl restart mediawiki_job_generatecaptcha.service - T388531
  • 12:11 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt-staging2001.codfw.wmnet
  • 12:10 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of global-data-captcha-render
  • 12:10 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of global-data-captcha-render
  • 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T395241)', diff saved to https://phabricator.wikimedia.org/P76818 and previous config saved to /var/cache/conftool/dbconfig/20250602-121041-fceratto.json
  • 12:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T395241)', diff saved to https://phabricator.wikimedia.org/P76817 and previous config saved to /var/cache/conftool/dbconfig/20250602-121016-fceratto.json
  • 12:07 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host apt-staging2001.codfw.wmnet
  • 12:00 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2002.codfw.wmnet
  • 11:59 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 11:57 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 11:55 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P76814 and previous config saved to /var/cache/conftool/dbconfig/20250602-115509-fceratto.json
  • 11:54 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host build2002.codfw.wmnet
  • 11:51 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh7003.wikimedia.org with reason: host reimage
  • 11:49 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on doh7003.wikimedia.org with reason: host reimage
  • 11:48 marostegui@cumin1002: START - Cookbook sre.mysql.pool es2047 gradually with 4 steps - Pool es2047.codfw.wmnet in after cloning
  • 11:47 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 11:46 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
  • 11:42 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
  • 11:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P76812 and previous config saved to /var/cache/conftool/dbconfig/20250602-114001-fceratto.json
  • 11:39 claime: cgoubert@mwmaint1002:~$ sudo systemctl restart mediawiki_job_generatecaptcha.service - T388531
  • 11:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
  • 11:33 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
  • 11:32 claime: Manual run of cronjobs/generatecaptcha on k8s - T388531
  • 11:31 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 11:30 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 11:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T395241)', diff saved to https://phabricator.wikimedia.org/P76811 and previous config saved to /var/cache/conftool/dbconfig/20250602-112453-fceratto.json
  • 11:23 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint1001.eqiad.wmnet
  • 11:19 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ldap-maint1001.eqiad.wmnet
  • 11:18 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host doh7003.wikimedia.org with OS bookworm
  • 11:17 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh7003.wikimedia.org - jmm@cumin1003"
  • 11:17 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh7003.wikimedia.org - jmm@cumin1003"
  • 11:17 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh7003.wikimedia.org on all recursors
  • 11:17 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache doh7003.wikimedia.org on all recursors
  • 11:17 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:17 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh7003.wikimedia.org - jmm@cumin1003"
  • 11:16 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh7003.wikimedia.org - jmm@cumin1003"
  • 11:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T395241)', diff saved to https://phabricator.wikimedia.org/P76810 and previous config saved to /var/cache/conftool/dbconfig/20250602-111519-fceratto.json
  • 11:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 11:11 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 11:11 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host doh7003.wikimedia.org
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76809 and previous config saved to /var/cache/conftool/dbconfig/20250602-111044-root.json
  • 10:58 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum7003.magru.wmnet
  • 10:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum7003.magru.wmnet with OS bookworm
  • 10:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint2001.codfw.wmnet
  • 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76808 and previous config saved to /var/cache/conftool/dbconfig/20250602-105539-root.json
  • 10:54 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ldap-maint2001.codfw.wmnet
  • 10:48 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2036 gradually with 4 steps - Pool es2036.codfw.wmnet in after cloning
  • 10:41 kamila@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/jaeger: apply
  • 10:40 kamila@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/jaeger: apply
  • 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76806 and previous config saved to /var/cache/conftool/dbconfig/20250602-104032-root.json
  • 10:40 kamila@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
  • 10:40 kamila@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
  • 10:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
  • 10:34 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
  • 10:31 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
  • 10:27 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76804 and previous config saved to /var/cache/conftool/dbconfig/20250602-102526-root.json
  • 10:22 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2003.codfw.wmnet
  • 10:18 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
  • 10:12 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76802 and previous config saved to /var/cache/conftool/dbconfig/20250602-101020-root.json
  • 10:08 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
  • 10:07 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host durum7003.magru.wmnet with OS bookworm
  • 10:06 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum7003.magru.wmnet - jmm@cumin1003"
  • 10:06 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum7003.magru.wmnet - jmm@cumin1003"
  • 10:06 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum7003.magru.wmnet on all recursors
  • 10:06 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache durum7003.magru.wmnet on all recursors
  • 10:06 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:06 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum7003.magru.wmnet - jmm@cumin1003"
  • 10:06 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum7003.magru.wmnet - jmm@cumin1003"
  • 10:02 marostegui@cumin1002: START - Cookbook sre.mysql.pool es2036 gradually with 4 steps - Pool es2036.codfw.wmnet in after cloning
  • 10:02 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 10:02 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host durum7003.magru.wmnet
  • 09:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2003.codfw.wmnet
  • 09:59 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir7003.magru.wmnet
  • 09:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir7003.magru.wmnet with OS bookworm
  • 09:55 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host rpki2003.codfw.wmnet
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76800 and previous config saved to /var/cache/conftool/dbconfig/20250602-095514-root.json
  • 09:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2039.codfw.wmnet with reason: Maintenance
  • 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2039 T395647', diff saved to https://phabricator.wikimedia.org/P76798 and previous config saved to /var/cache/conftool/dbconfig/20250602-094402-marostegui.json
  • 09:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
  • 09:33 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
  • 09:28 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir7003.magru.wmnet with reason: host reimage
  • 09:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
  • 09:24 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir7003.magru.wmnet with reason: host reimage
  • 09:22 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
  • 09:13 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet
  • 09:10 jelto: update gitlab-settings artifact retention to 6 month - T395014
  • 09:09 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet
  • 09:02 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet
  • 08:58 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7003.magru.wmnet with OS bookworm
  • 08:58 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet
  • 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7003.magru.wmnet - jmm@cumin1003"
  • 08:51 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7003.magru.wmnet - jmm@cumin1003"
  • 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7003.magru.wmnet on all recursors
  • 08:51 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7003.magru.wmnet on all recursors
  • 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7003.magru.wmnet - jmm@cumin1003"
  • 08:51 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7003.magru.wmnet - jmm@cumin1003"
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76796 and previous config saved to /var/cache/conftool/dbconfig/20250602-085105-root.json
  • 08:47 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 08:47 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7003.magru.wmnet
  • 08:45 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ncredir7003.magru.wmnet with OS bookworm
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ncredir7003.magru.wmnet
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:41 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76795 and previous config saved to /var/cache/conftool/dbconfig/20250602-083559-root.json
  • 08:33 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ncredir7003.magru.wmnet
  • 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76794 and previous config saved to /var/cache/conftool/dbconfig/20250602-082053-root.json
  • 08:11 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7003.magru.wmnet with OS bookworm
  • 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76793 and previous config saved to /var/cache/conftool/dbconfig/20250602-080547-root.json
  • 07:58 phuedx@deploy1003: Finished scap sync-world: Backport for Beta Cluster: Support A/B experiments (T393918) (duration: 35m 59s)
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76792 and previous config saved to /var/cache/conftool/dbconfig/20250602-075041-root.json
  • 07:49 phuedx@deploy1003: phuedx, dr0ptp4kt: Continuing with sync
  • 07:38 phuedx@deploy1003: phuedx, dr0ptp4kt: Backport for Beta Cluster: Support A/B experiments (T393918) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:36 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
  • 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76791 and previous config saved to /var/cache/conftool/dbconfig/20250602-073535-root.json
  • 07:27 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
  • 07:22 phuedx@deploy1003: Started scap sync-world: Backport for Beta Cluster: Support A/B experiments (T393918)
  • 07:20 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5004.wikimedia.org
  • 07:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Maintenance
  • 07:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: Maintenance
  • 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76790 and previous config saved to /var/cache/conftool/dbconfig/20250602-070837-root.json
  • 07:08 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
  • 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1040 T395647', diff saved to https://phabricator.wikimedia.org/P76789 and previous config saved to /var/cache/conftool/dbconfig/20250602-070602-marostegui.json
  • 07:02 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
  • 06:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76787 and previous config saved to /var/cache/conftool/dbconfig/20250602-065331-root.json
  • 06:53 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
  • 06:52 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet
  • 06:48 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76786 and previous config saved to /var/cache/conftool/dbconfig/20250602-063826-root.json
  • 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76785 and previous config saved to /var/cache/conftool/dbconfig/20250602-062320-root.json
  • 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76783 and previous config saved to /var/cache/conftool/dbconfig/20250602-060815-root.json
  • 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76782 and previous config saved to /var/cache/conftool/dbconfig/20250602-055309-root.json
  • 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169 T395663', diff saved to https://phabricator.wikimedia.org/P76781 and previous config saved to /var/cache/conftool/dbconfig/20250602-053905-marostegui.json
  • 05:38 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Add es2047 to dbctl depooled T395771', diff saved to https://phabricator.wikimedia.org/P76780 and previous config saved to /var/cache/conftool/dbconfig/20250602-051957-marostegui.json
  • 05:15 marostegui@cumin1002: START - Cookbook sre.mysql.clone of es2036.codfw.wmnet onto es2047.codfw.wmnet
  • 05:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2036.codfw.wmnet with reason: Maintenance
  • 05:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2036 T395771', diff saved to https://phabricator.wikimedia.org/P76779 and previous config saved to /var/cache/conftool/dbconfig/20250602-050150-marostegui.json

Archives

See Server Admin Log/Archives.