Jump to content

Server Admin Log/Archive 98

From Wikitech

2025-10-31

  • 19:59 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1001-dev.eqiad.wmnet with reason: host reimage
  • 19:52 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1001-dev.eqiad.wmnet with reason: host reimage
  • 19:42 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudbackup1001-dev.eqiad.wmnet with OS trixie
  • 19:36 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 19:36 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 19:36 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 19:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 19:12 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
  • 19:12 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
  • 17:30 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 17:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Apply JVM upgrade to 11.0.29 - eevans@cumin1003
  • 16:39 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie
  • 16:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:10 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 16:09 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 15:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
  • 15:50 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
  • 15:48 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:47 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:22 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 15:22 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 15:16 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 15:13 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply JVM upgrade to 11.0.29 - eevans@cumin1003
  • 15:03 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie
  • 15:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 14:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 14:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T407997)', diff saved to https://phabricator.wikimedia.org/P84563 and previous config saved to /var/cache/conftool/dbconfig/20251031-145834-marostegui.json
  • 14:57 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2009.codfw.wmnet with OS trixie
  • 14:56 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 14:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 14:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P84562 and previous config saved to /var/cache/conftool/dbconfig/20251031-144324-marostegui.json
  • 14:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P84561 and previous config saved to /var/cache/conftool/dbconfig/20251031-142816-marostegui.json
  • 14:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T407997)', diff saved to https://phabricator.wikimedia.org/P84560 and previous config saved to /var/cache/conftool/dbconfig/20251031-141309-marostegui.json
  • 14:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2195 (T407997)', diff saved to https://phabricator.wikimedia.org/P84559 and previous config saved to /var/cache/conftool/dbconfig/20251031-140046-marostegui.json
  • 14:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 14:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T407997)', diff saved to https://phabricator.wikimedia.org/P84558 and previous config saved to /var/cache/conftool/dbconfig/20251031-140022-marostegui.json
  • 13:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P84557 and previous config saved to /var/cache/conftool/dbconfig/20251031-134514-marostegui.json
  • 13:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P84556 and previous config saved to /var/cache/conftool/dbconfig/20251031-133007-marostegui.json
  • 13:25 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:23 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T407997)', diff saved to https://phabricator.wikimedia.org/P84555 and previous config saved to /var/cache/conftool/dbconfig/20251031-131459-marostegui.json
  • 13:01 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2181 (T407997)', diff saved to https://phabricator.wikimedia.org/P84554 and previous config saved to /var/cache/conftool/dbconfig/20251031-130110-marostegui.json
  • 13:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 13:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T407997)', diff saved to https://phabricator.wikimedia.org/P84553 and previous config saved to /var/cache/conftool/dbconfig/20251031-130046-marostegui.json
  • 12:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P84552 and previous config saved to /var/cache/conftool/dbconfig/20251031-124537-marostegui.json
  • 12:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P84551 and previous config saved to /var/cache/conftool/dbconfig/20251031-123030-marostegui.json
  • 12:17 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwdebug[2001-2002].codfw.wmnet
  • 12:17 jiji@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:17 jiji@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mwdebug[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
  • 12:16 jiji@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mwdebug[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
  • 12:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T407997)', diff saved to https://phabricator.wikimedia.org/P84550 and previous config saved to /var/cache/conftool/dbconfig/20251031-121522-marostegui.json
  • 12:01 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2167 (T407997)', diff saved to https://phabricator.wikimedia.org/P84549 and previous config saved to /var/cache/conftool/dbconfig/20251031-120132-marostegui.json
  • 12:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 12:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T407997)', diff saved to https://phabricator.wikimedia.org/P84548 and previous config saved to /var/cache/conftool/dbconfig/20251031-120108-marostegui.json
  • 11:54 jiji@cumin1003: START - Cookbook sre.dns.netbox
  • 11:53 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie
  • 11:49 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 11:47 jiji@cumin1003: START - Cookbook sre.hosts.decommission for hosts mwdebug[2001-2002].codfw.wmnet
  • 11:47 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwdebug[1001-1002].eqiad.wmnet
  • 11:47 jiji@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:46 jiji@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mwdebug[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
  • 11:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P84547 and previous config saved to /var/cache/conftool/dbconfig/20251031-114600-marostegui.json
  • 11:45 jiji@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mwdebug[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1003"
  • 11:41 jiji@cumin1003: START - Cookbook sre.dns.netbox
  • 11:33 jiji@cumin1003: START - Cookbook sre.hosts.decommission for hosts mwdebug[1001-1002].eqiad.wmnet
  • 11:32 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es2034.codfw.wmnet
  • 11:32 fceratto@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:32 fceratto@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2034.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
  • 11:32 fceratto@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2034.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
  • 11:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 11:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P84546 and previous config saved to /var/cache/conftool/dbconfig/20251031-113052-marostegui.json
  • 11:28 fceratto@cumin1003: START - Cookbook sre.dns.netbox
  • 11:23 fceratto@cumin1003: START - Cookbook sre.hosts.decommission for hosts es2034.codfw.wmnet
  • 11:21 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es2033.codfw.wmnet
  • 11:21 fceratto@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:21 fceratto@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2033.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
  • 11:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T407997)', diff saved to https://phabricator.wikimedia.org/P84545 and previous config saved to /var/cache/conftool/dbconfig/20251031-111544-marostegui.json
  • 11:14 fceratto@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2033.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
  • 11:10 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS trixie
  • 11:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 11:06 fceratto@cumin1003: START - Cookbook sre.dns.netbox
  • 11:00 fceratto@cumin1003: START - Cookbook sre.hosts.decommission for hosts es2033.codfw.wmnet
  • 10:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2166 (T407997)', diff saved to https://phabricator.wikimedia.org/P84544 and previous config saved to /var/cache/conftool/dbconfig/20251031-105956-marostegui.json
  • 10:59 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 10:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T407997)', diff saved to https://phabricator.wikimedia.org/P84543 and previous config saved to /var/cache/conftool/dbconfig/20251031-105932-marostegui.json
  • 10:55 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es2032.codfw.wmnet
  • 10:55 fceratto@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:55 fceratto@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2032.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
  • 10:52 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:50 fceratto@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2032.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
  • 10:46 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:46 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P84542 and previous config saved to /var/cache/conftool/dbconfig/20251031-104424-marostegui.json
  • 10:42 fceratto@cumin1003: START - Cookbook sre.dns.netbox
  • 10:39 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:37 fceratto@cumin1003: START - Cookbook sre.hosts.decommission for hosts es2032.codfw.wmnet
  • 10:35 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 10:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P84541 and previous config saved to /var/cache/conftool/dbconfig/20251031-102916-marostegui.json
  • 10:27 taavi@deploy2002: mwscript-k8s job started: namespaceDupes.php --wiki=crhwiki '--add-prefix=BROKEN ' --fix # T408284
  • 10:27 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 45014
  • 10:26 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 45014
  • 10:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T407997)', diff saved to https://phabricator.wikimedia.org/P84540 and previous config saved to /var/cache/conftool/dbconfig/20251031-101409-marostegui.json
  • 10:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 10:03 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 09:59 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2164 (T407997)', diff saved to https://phabricator.wikimedia.org/P84539 and previous config saved to /var/cache/conftool/dbconfig/20251031-095818-marostegui.json
  • 09:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 09:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T407997)', diff saved to https://phabricator.wikimedia.org/P84538 and previous config saved to /var/cache/conftool/dbconfig/20251031-095754-marostegui.json
  • 09:53 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es2027.codfw.wmnet
  • 09:53 fceratto@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:53 fceratto@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
  • 09:53 fceratto@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
  • 09:51 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:50 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P84537 and previous config saved to /var/cache/conftool/dbconfig/20251031-094246-marostegui.json
  • 09:41 fceratto@cumin1003: START - Cookbook sre.dns.netbox
  • 09:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:39 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:36 fceratto@cumin1003: START - Cookbook sre.hosts.decommission for hosts es2027.codfw.wmnet
  • 09:35 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 09:32 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1090.eqiad.wmnet with OS bullseye
  • 09:32 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 09:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 09:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P84536 and previous config saved to /var/cache/conftool/dbconfig/20251031-092738-marostegui.json
  • 09:27 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 09:17 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1090.eqiad.wmnet with reason: host reimage
  • 09:14 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1090.eqiad.wmnet with reason: host reimage
  • 09:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T407997)', diff saved to https://phabricator.wikimedia.org/P84535 and previous config saved to /var/cache/conftool/dbconfig/20251031-091230-marostegui.json
  • 09:01 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1090.eqiad.wmnet with OS bullseye
  • 08:59 marostegui@cumin1003: dbctl commit (dc=all): 'db1214 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84534 and previous config saved to /var/cache/conftool/dbconfig/20251031-085934-root.json
  • 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2163 (T407997)', diff saved to https://phabricator.wikimedia.org/P84533 and previous config saved to /var/cache/conftool/dbconfig/20251031-085852-marostegui.json
  • 08:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T407997)', diff saved to https://phabricator.wikimedia.org/P84532 and previous config saved to /var/cache/conftool/dbconfig/20251031-085827-marostegui.json
  • 08:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010
  • 08:52 elukey@cumin2002: START - Cookbook sre.hosts.powercycle for host sretest2010
  • 08:44 marostegui@cumin1003: dbctl commit (dc=all): 'db1214 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84531 and previous config saved to /var/cache/conftool/dbconfig/20251031-084428-root.json
  • 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P84530 and previous config saved to /var/cache/conftool/dbconfig/20251031-084320-marostegui.json
  • 08:29 marostegui@cumin1003: dbctl commit (dc=all): 'db1214 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84529 and previous config saved to /var/cache/conftool/dbconfig/20251031-082923-root.json
  • 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P84528 and previous config saved to /var/cache/conftool/dbconfig/20251031-082812-marostegui.json
  • 08:14 marostegui@cumin1003: dbctl commit (dc=all): 'db1214 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84527 and previous config saved to /var/cache/conftool/dbconfig/20251031-081417-root.json
  • 08:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T407997)', diff saved to https://phabricator.wikimedia.org/P84526 and previous config saved to /var/cache/conftool/dbconfig/20251031-081304-marostegui.json
  • 08:06 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1214 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84525 and previous config saved to /var/cache/conftool/dbconfig/20251031-080633-marostegui.json
  • 08:06 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 07:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2161 (T407997)', diff saved to https://phabricator.wikimedia.org/P84524 and previous config saved to /var/cache/conftool/dbconfig/20251031-075931-marostegui.json
  • 07:59 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 07:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T407997)', diff saved to https://phabricator.wikimedia.org/P84523 and previous config saved to /var/cache/conftool/dbconfig/20251031-075907-marostegui.json
  • 07:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P84522 and previous config saved to /var/cache/conftool/dbconfig/20251031-074359-marostegui.json
  • 07:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P84521 and previous config saved to /var/cache/conftool/dbconfig/20251031-072852-marostegui.json
  • 07:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T407997)', diff saved to https://phabricator.wikimedia.org/P84520 and previous config saved to /var/cache/conftool/dbconfig/20251031-071344-marostegui.json
  • 07:12 marostegui@cumin1003: dbctl commit (dc=all): 'db2173 (re)pooling @ 100%: After upgrading', diff saved to https://phabricator.wikimedia.org/P84519 and previous config saved to /var/cache/conftool/dbconfig/20251031-071243-root.json
  • 07:04 marostegui@cumin1003: dbctl commit (dc=all): 'db1226 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84518 and previous config saved to /var/cache/conftool/dbconfig/20251031-070422-root.json
  • 06:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2154 (T407997)', diff saved to https://phabricator.wikimedia.org/P84517 and previous config saved to /var/cache/conftool/dbconfig/20251031-065953-marostegui.json
  • 06:59 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 06:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T407997)', diff saved to https://phabricator.wikimedia.org/P84516 and previous config saved to /var/cache/conftool/dbconfig/20251031-065929-marostegui.json
  • 06:57 marostegui@cumin1003: dbctl commit (dc=all): 'db2173 (re)pooling @ 75%: After upgrading', diff saved to https://phabricator.wikimedia.org/P84515 and previous config saved to /var/cache/conftool/dbconfig/20251031-065737-root.json
  • 06:49 marostegui@cumin1003: dbctl commit (dc=all): 'db1226 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84514 and previous config saved to /var/cache/conftool/dbconfig/20251031-064916-root.json
  • 06:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P84513 and previous config saved to /var/cache/conftool/dbconfig/20251031-064422-marostegui.json
  • 06:42 marostegui@cumin1003: dbctl commit (dc=all): 'db2173 (re)pooling @ 50%: After upgrading', diff saved to https://phabricator.wikimedia.org/P84512 and previous config saved to /var/cache/conftool/dbconfig/20251031-064231-root.json
  • 06:34 marostegui@cumin1003: dbctl commit (dc=all): 'db1226 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84511 and previous config saved to /var/cache/conftool/dbconfig/20251031-063410-root.json
  • 06:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P84510 and previous config saved to /var/cache/conftool/dbconfig/20251031-062914-marostegui.json
  • 06:27 marostegui@cumin1003: dbctl commit (dc=all): 'db2173 (re)pooling @ 25%: After upgrading', diff saved to https://phabricator.wikimedia.org/P84509 and previous config saved to /var/cache/conftool/dbconfig/20251031-062725-root.json
  • 06:19 marostegui@cumin1003: dbctl commit (dc=all): 'db1226 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84508 and previous config saved to /var/cache/conftool/dbconfig/20251031-061904-root.json
  • 06:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T407997)', diff saved to https://phabricator.wikimedia.org/P84507 and previous config saved to /var/cache/conftool/dbconfig/20251031-061406-marostegui.json
  • 06:12 marostegui@cumin1003: dbctl commit (dc=all): 'db2173 (re)pooling @ 10%: After upgrading', diff saved to https://phabricator.wikimedia.org/P84506 and previous config saved to /var/cache/conftool/dbconfig/20251031-061219-root.json
  • 06:11 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1226 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84505 and previous config saved to /var/cache/conftool/dbconfig/20251031-061110-marostegui.json
  • 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 06:04 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2173 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84504 and previous config saved to /var/cache/conftool/dbconfig/20251031-060405-marostegui.json
  • 06:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2152 (T407997)', diff saved to https://phabricator.wikimedia.org/P84503 and previous config saved to /var/cache/conftool/dbconfig/20251031-060012-marostegui.json
  • 06:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 03:21 ejegg: restarted donations queue consumer
  • 03:01 ejegg: temporarily disabled donations queue consumer to get Acoustic export to work
  • 02:48 tstarling@deploy2002: Finished scap sync-world: Backport for Enable ChangesListQuery partitioning on all wikis (T403798) (duration: 40m 01s)
  • 02:43 tstarling@deploy2002: tstarling: Continuing with sync
  • 02:12 tstarling@deploy2002: tstarling: Backport for Enable ChangesListQuery partitioning on all wikis (T403798) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 02:08 tstarling@deploy2002: Started scap sync-world: Backport for Enable ChangesListQuery partitioning on all wikis (T403798)
  • 01:19 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 18m 13s)
  • 01:15 mutante: upgraded envoyproxy on lists2001, aphlict1002, aphlict2001 T405808
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 00:18 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on tcp-proxy1002.eqiad.wmnet with reason: in setup
  • 00:17 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on tcp-proxy1001.eqiad.wmnet with reason: in setup

2025-10-30

  • 23:48 mutante: forward-fixing to puppet7 on tcp-proxy1001/1002 per T349619 T408532
  • 23:35 tstarling@deploy2002: Finished scap sync-world: Backport for Enable ChangesListQuery partitioning on enwiki and commonswiki (T403798) (duration: 14m 33s)
  • 23:27 tstarling@deploy2002: tstarling: Continuing with sync
  • 23:25 tstarling@deploy2002: tstarling: Backport for Enable ChangesListQuery partitioning on enwiki and commonswiki (T403798) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:21 tstarling@deploy2002: Started scap sync-world: Backport for Enable ChangesListQuery partitioning on enwiki and commonswiki (T403798)
  • 23:04 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1002-dev.eqiad.wmnet with OS trixie
  • 23:00 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices2005-dev.codfw.wmnet with OS trixie
  • 22:56 tstarling@deploy2002: Finished scap sync-world: Backport for Enable ChangesListQuery partitioning on mediawikiwiki (T403798) (duration: 40m 21s)
  • 22:42 tstarling@deploy2002: tstarling: Continuing with sync
  • 22:42 tstarling@deploy2002: tstarling: Backport for Enable ChangesListQuery partitioning on mediawikiwiki (T403798) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:33 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 22:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T407997)', diff saved to https://phabricator.wikimedia.org/P84502 and previous config saved to /var/cache/conftool/dbconfig/20251030-223331-marostegui.json
  • 22:32 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Apply JVM upgrade to 11.0.29 - eevans@cumin1003
  • 22:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P84501 and previous config saved to /var/cache/conftool/dbconfig/20251030-221824-marostegui.json
  • 22:15 tstarling@deploy2002: Started scap sync-world: Backport for Enable ChangesListQuery partitioning on mediawikiwiki (T403798)
  • 22:04 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 22:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P84500 and previous config saved to /var/cache/conftool/dbconfig/20251030-220316-marostegui.json
  • 21:57 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 21:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T407997)', diff saved to https://phabricator.wikimedia.org/P84499 and previous config saved to /var/cache/conftool/dbconfig/20251030-214808-marostegui.json
  • 21:48 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices2005-dev.codfw.wmnet with reason: host reimage
  • 21:44 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudbackup1002-dev.eqiad.wmnet with OS trixie
  • 21:42 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices2005-dev.codfw.wmnet with reason: host reimage
  • 21:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1226 (T407997)', diff saved to https://phabricator.wikimedia.org/P84498 and previous config saved to /var/cache/conftool/dbconfig/20251030-213649-marostegui.json
  • 21:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 21:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T407997)', diff saved to https://phabricator.wikimedia.org/P84497 and previous config saved to /var/cache/conftool/dbconfig/20251030-213625-marostegui.json
  • 21:25 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices2005-dev.codfw.wmnet with OS trixie
  • 21:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P84496 and previous config saved to /var/cache/conftool/dbconfig/20251030-212117-marostegui.json
  • 21:17 sbassett: Deployed updated security mitigation for T407131
  • 21:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P84495 and previous config saved to /var/cache/conftool/dbconfig/20251030-210610-marostegui.json
  • 21:05 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 21:05 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 20:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T407997)', diff saved to https://phabricator.wikimedia.org/P84493 and previous config saved to /var/cache/conftool/dbconfig/20251030-205102-marostegui.json
  • 20:39 kharlan@deploy2002: Finished scap sync-world: Backport for EventBus: Enable TYPE_EVENT for loginwiki (T408701), hCaptcha: Enable 100% passive mode for edits on test2wiki (T405586) (duration: 12m 08s)
  • 20:39 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1214 (T407997)', diff saved to https://phabricator.wikimedia.org/P84492 and previous config saved to /var/cache/conftool/dbconfig/20251030-203933-marostegui.json
  • 20:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 20:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T407997)', diff saved to https://phabricator.wikimedia.org/P84491 and previous config saved to /var/cache/conftool/dbconfig/20251030-203910-marostegui.json
  • 20:35 kharlan@deploy2002: kharlan: Continuing with sync
  • 20:29 kharlan@deploy2002: kharlan: Backport for EventBus: Enable TYPE_EVENT for loginwiki (T408701), hCaptcha: Enable 100% passive mode for edits on test2wiki (T405586) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:27 kharlan@deploy2002: Started scap sync-world: Backport for EventBus: Enable TYPE_EVENT for loginwiki (T408701), hCaptcha: Enable 100% passive mode for edits on test2wiki (T405586)
  • 20:27 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Apply JVM upgrade to 11.0.29 - eevans@cumin1003
  • 20:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P84489 and previous config saved to /var/cache/conftool/dbconfig/20251030-202402-marostegui.json
  • 20:18 arlolra@deploy2002: Finished scap sync-world: Backport for Turn off GeoCrumbsUseParserOutputFallback (T390236) (duration: 13m 26s)
  • 20:14 arlolra@deploy2002: arlolra: Continuing with sync
  • 20:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P84488 and previous config saved to /var/cache/conftool/dbconfig/20251030-200854-marostegui.json
  • 20:07 arlolra@deploy2002: arlolra: Backport for Turn off GeoCrumbsUseParserOutputFallback (T390236) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:04 arlolra@deploy2002: Started scap sync-world: Backport for Turn off GeoCrumbsUseParserOutputFallback (T390236)
  • 19:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T407997)', diff saved to https://phabricator.wikimedia.org/P84487 and previous config saved to /var/cache/conftool/dbconfig/20251030-195347-marostegui.json
  • 19:47 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1001-dev.eqiad.wmnet with OS trixie
  • 19:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1209 (T407997)', diff saved to https://phabricator.wikimedia.org/P84486 and previous config saved to /var/cache/conftool/dbconfig/20251030-194105-marostegui.json
  • 19:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 19:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T407997)', diff saved to https://phabricator.wikimedia.org/P84485 and previous config saved to /var/cache/conftool/dbconfig/20251030-194041-marostegui.json
  • 19:32 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1001-dev.eqiad.wmnet with reason: host reimage
  • 19:27 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1001-dev.eqiad.wmnet with reason: host reimage
  • 19:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P84484 and previous config saved to /var/cache/conftool/dbconfig/20251030-192534-marostegui.json
  • 19:13 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudbackup1001-dev.eqiad.wmnet with OS trixie
  • 19:12 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.25 refs T405681
  • 19:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P84483 and previous config saved to /var/cache/conftool/dbconfig/20251030-191026-marostegui.json
  • 19:02 bking@dns1004: END - running authdns-update
  • 19:01 bking@dns1004: START - running authdns-update
  • 18:58 dduvall@deploy2002: Finished scap sync-world: Backport for Do not use special db group (T408540), Do not use special db group (T408540) (duration: 07m 24s)
  • 18:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T407997)', diff saved to https://phabricator.wikimedia.org/P84482 and previous config saved to /var/cache/conftool/dbconfig/20251030-185518-marostegui.json
  • 18:53 dduvall@deploy2002: zabe, dduvall: Continuing with sync
  • 18:53 dduvall@deploy2002: zabe, dduvall: Backport for Do not use special db group (T408540), Do not use special db group (T408540) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:50 dduvall@deploy2002: Started scap sync-world: Backport for Do not use special db group (T408540), Do not use special db group (T408540)
  • 18:42 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1203 (T407997)', diff saved to https://phabricator.wikimedia.org/P84481 and previous config saved to /var/cache/conftool/dbconfig/20251030-184200-marostegui.json
  • 18:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T407997)', diff saved to https://phabricator.wikimedia.org/P84480 and previous config saved to /var/cache/conftool/dbconfig/20251030-184136-marostegui.json
  • 18:35 ejegg: payments-wiki upgraded from 3db28493 to 0132998e
  • 18:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P84479 and previous config saved to /var/cache/conftool/dbconfig/20251030-182629-marostegui.json
  • 18:18 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.25 refs T405681
  • 18:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P84478 and previous config saved to /var/cache/conftool/dbconfig/20251030-181121-marostegui.json
  • 18:09 dduvall: rolling back group2 from 1.45.0-wmf.25 to wmf.24 due to high rate of `PHP Deprecated: Asking for a replica from groups except dump/vslow is deprecated` errors (T405681)
  • 18:09 dduvall: rolling back group2 from 1.45.0-wmf.25 to wmf.24 due to high rate of `PHP Deprecated: Asking for a replica from groups except dump/vslow is deprecated` errors
  • 18:06 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.25 refs T405681
  • 18:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2028.codfw.wmnet with OS trixie
  • 17:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T407997)', diff saved to https://phabricator.wikimedia.org/P84477 and previous config saved to /var/cache/conftool/dbconfig/20251030-175611-marostegui.json
  • 17:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 17:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 17:53 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.25 refs T405681
  • 17:46 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Apply JVM upgrade to 11.0.29 - eevans@cumin1003
  • 17:42 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1192 (T407997)', diff saved to https://phabricator.wikimedia.org/P84476 and previous config saved to /var/cache/conftool/dbconfig/20251030-174257-marostegui.json
  • 17:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 17:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T407997)', diff saved to https://phabricator.wikimedia.org/P84475 and previous config saved to /var/cache/conftool/dbconfig/20251030-174233-marostegui.json
  • 17:36 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:36 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:36 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:36 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:36 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:35 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:35 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:35 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:30 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:30 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:30 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:30 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:29 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:29 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:29 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:29 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:27 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 17:27 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2116-2123,2216-2230].codfw.wmnet
  • 17:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P84474 and previous config saved to /var/cache/conftool/dbconfig/20251030-172726-marostegui.json
  • 17:27 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 17:20 swfrench@deploy2002: Finished scap sync-world: Backport for Enroll 50% of client sessions in PHP 8.3 (T405955) (duration: 16m 44s)
  • 17:14 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2116-2123,2216-2230].codfw.wmnet
  • 17:12 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2320-2330].codfw.wmnet
  • 17:12 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2320-2330].codfw.wmnet
  • 17:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P84473 and previous config saved to /var/cache/conftool/dbconfig/20251030-171218-marostegui.json
  • 17:12 swfrench@deploy2002: swfrench: Continuing with sync
  • 17:12 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2300-2319].codfw.wmnet
  • 17:12 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2300-2319].codfw.wmnet
  • 17:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
  • 17:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
  • 17:09 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2230-2241].codfw.wmnet
  • 17:08 swfrench@deploy2002: swfrench: Backport for Enroll 50% of client sessions in PHP 8.3 (T405955) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:03 swfrench@deploy2002: Started scap sync-world: Backport for Enroll 50% of client sessions in PHP 8.3 (T405955)
  • 17:02 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2230-2241].codfw.wmnet
  • 17:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
  • 17:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
  • 17:01 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2288-2299].codfw.wmnet
  • 17:01 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2288-2299].codfw.wmnet
  • 17:00 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2040,2043,2045,2048,2052-2054,2063,2079-2084,2096-2101].codfw.wmnet
  • 16:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
  • 16:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
  • 16:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T407997)', diff saved to https://phabricator.wikimedia.org/P84472 and previous config saved to /var/cache/conftool/dbconfig/20251030-165710-marostegui.json
  • 16:45 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 16:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
  • 16:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
  • 16:45 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 16:44 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2040,2043,2045,2048,2052-2054,2063,2079-2084,2096-2101].codfw.wmnet
  • 16:44 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2268-2287].codfw.wmnet
  • 16:43 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2268-2287].codfw.wmnet
  • 16:43 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1178 (T407997)', diff saved to https://phabricator.wikimedia.org/P84471 and previous config saved to /var/cache/conftool/dbconfig/20251030-164346-marostegui.json
  • 16:43 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 16:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T407997)', diff saved to https://phabricator.wikimedia.org/P84470 and previous config saved to /var/cache/conftool/dbconfig/20251030-164322-marostegui.json
  • 16:43 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 16:42 otto@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 16:42 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 16:42 otto@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 16:39 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 16:39 otto@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 16:32 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2003-2004,2007-2010,2019-2032].codfw.wmnet
  • 16:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P84469 and previous config saved to /var/cache/conftool/dbconfig/20251030-162814-marostegui.json
  • 16:22 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Apply JVM upgrade to 11.0.29 - eevans@cumin1003
  • 16:19 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2003-2004,2007-2010,2019-2032].codfw.wmnet
  • 16:16 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2248-2267].codfw.wmnet
  • 16:16 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2248-2267].codfw.wmnet
  • 16:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P84468 and previous config saved to /var/cache/conftool/dbconfig/20251030-161306-marostegui.json
  • 16:12 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 16:12 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 16:10 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 16:09 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 16:02 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Apply JVM upgrade to 11.0.29 - eevans@cumin1003
  • 15:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T407997)', diff saved to https://phabricator.wikimedia.org/P84467 and previous config saved to /var/cache/conftool/dbconfig/20251030-155758-marostegui.json
  • 15:51 marostegui@cumin1003: dbctl commit (dc=all): 'db2170 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84466 and previous config saved to /var/cache/conftool/dbconfig/20251030-155153-root.json
  • 15:51 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 15:51 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 15:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1177 (T407997)', diff saved to https://phabricator.wikimedia.org/P84465 and previous config saved to /var/cache/conftool/dbconfig/20251030-154434-marostegui.json
  • 15:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 15:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T407997)', diff saved to https://phabricator.wikimedia.org/P84464 and previous config saved to /var/cache/conftool/dbconfig/20251030-154420-marostegui.json
  • 15:37 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 15:37 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 15:36 marostegui@cumin1003: dbctl commit (dc=all): 'db2170 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84463 and previous config saved to /var/cache/conftool/dbconfig/20251030-153647-root.json
  • 15:36 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 15:36 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 15:35 dancy@deploy2002: Installation of scap version "4.221.0" completed for 165 hosts
  • 15:34 moritzm: installing imagemagick security updates
  • 15:32 moritzm: installing openjdk-21 security updates
  • 15:31 dancy@deploy2002: Installing scap version "4.221.0" for 165 host(s)
  • 15:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P84462 and previous config saved to /var/cache/conftool/dbconfig/20251030-152913-marostegui.json
  • 15:24 cmooney@cumin1003: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:24 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:21 marostegui@cumin1003: dbctl commit (dc=all): 'db2170 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84461 and previous config saved to /var/cache/conftool/dbconfig/20251030-152141-root.json
  • 15:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P84460 and previous config saved to /var/cache/conftool/dbconfig/20251030-151405-marostegui.json
  • 15:09 marostegui@cumin1003: dbctl commit (dc=all): 'db2195 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84459 and previous config saved to /var/cache/conftool/dbconfig/20251030-150946-root.json
  • 15:06 marostegui@cumin1003: dbctl commit (dc=all): 'db2170 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84458 and previous config saved to /var/cache/conftool/dbconfig/20251030-150636-root.json
  • 14:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T407997)', diff saved to https://phabricator.wikimedia.org/P84457 and previous config saved to /var/cache/conftool/dbconfig/20251030-145857-marostegui.json
  • 14:58 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2170 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84456 and previous config saved to /var/cache/conftool/dbconfig/20251030-145831-marostegui.json
  • 14:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 14:54 marostegui@cumin1003: dbctl commit (dc=all): 'db2195 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84455 and previous config saved to /var/cache/conftool/dbconfig/20251030-145440-root.json
  • 14:51 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 14:51 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 14:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1172 (T407997)', diff saved to https://phabricator.wikimedia.org/P84454 and previous config saved to /var/cache/conftool/dbconfig/20251030-144452-marostegui.json
  • 14:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:39 fnegri@cumin1003: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database pcmwikiquote (T408354)
  • 14:39 marostegui@cumin1003: dbctl commit (dc=all): 'db2195 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84453 and previous config saved to /var/cache/conftool/dbconfig/20251030-143934-root.json
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
  • 14:33 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:32 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Apply JVM upgrade to 11.0.29 - eevans@cumin1003
  • 14:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T407997)', diff saved to https://phabricator.wikimedia.org/P84452 and previous config saved to /var/cache/conftool/dbconfig/20251030-143204-marostegui.json
  • 14:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
  • 14:27 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 14:27 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 14:26 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:26 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 14:24 marostegui@cumin1003: dbctl commit (dc=all): 'db2195 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84451 and previous config saved to /var/cache/conftool/dbconfig/20251030-142428-root.json
  • 14:23 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P84450 and previous config saved to /var/cache/conftool/dbconfig/20251030-141657-marostegui.json
  • 14:16 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2195 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84449 and previous config saved to /var/cache/conftool/dbconfig/20251030-141638-marostegui.json
  • 14:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 14:15 aqu@deploy2002: Finished deploy [analytics/refinery@39e92e9] (thin): Update pageview allowlist THIN [analytics/refinery@39e92e9f] (duration: 01m 16s)
  • 14:13 aqu@deploy2002: Started deploy [analytics/refinery@39e92e9] (thin): Update pageview allowlist THIN [analytics/refinery@39e92e9f]
  • 14:12 aqu@deploy2002: Finished deploy [analytics/refinery@39e92e9]: Update pageview allowlist [analytics/refinery@39e92e9f] (duration: 03m 52s)
  • {{safesubst:SAL entry|1=14:11 mfossati@deploy2002: Finished scap sync-world: Backport for Add feature flag for pilot wikis about visual changes coming from Wikibase having an icon. (T397258), Disable default user-agent collection. (T384964), [huwiki] Set $wgUploadNavigationUrl (T408298), [ruwiki] Enable WikiLove extension (T408514), [[gerrit:1198626|core-Namespaces: Add R:}}
  • 14:08 aqu@deploy2002: Started deploy [analytics/refinery@39e92e9]: Update pageview allowlist [analytics/refinery@39e92e9f]
  • 14:07 aqu@deploy2002: Finished deploy [analytics/refinery@39e92e9] (hadoop-test): Update pageview allowlist TEST [analytics/refinery@39e92e9f] (duration: 01m 04s)
  • 14:06 aqu@deploy2002: Started deploy [analytics/refinery@39e92e9] (hadoop-test): Update pageview allowlist TEST [analytics/refinery@39e92e9f]
  • 14:05 mfossati@deploy2002: superpes, bunnypranav, javiermonton, mfossati, seanleong-wmde: Continuing with sync
  • 14:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P84448 and previous config saved to /var/cache/conftool/dbconfig/20251030-140147-marostegui.json
  • {{safesubst:SAL entry|1=13:50 mfossati@deploy2002: superpes, bunnypranav, javiermonton, mfossati, seanleong-wmde: Backport for Add feature flag for pilot wikis about visual changes coming from Wikibase having an icon. (T397258), Disable default user-agent collection. (T384964), [huwiki] Set $wgUploadNavigationUrl (T408298), [ruwiki] Enable WikiLove extension (T408514), [[g}}
  • {{safesubst:SAL entry|1=13:47 mfossati@deploy2002: Started scap sync-world: Backport for Add feature flag for pilot wikis about visual changes coming from Wikibase having an icon. (T397258), Disable default user-agent collection. (T384964), [huwiki] Set $wgUploadNavigationUrl (T408298), [ruwiki] Enable WikiLove extension (T408514), [[gerrit:1198626|core-Namespaces: Add R:}}
  • 13:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T407997)', diff saved to https://phabricator.wikimedia.org/P84447 and previous config saved to /var/cache/conftool/dbconfig/20251030-134639-marostegui.json
  • 13:42 fnegri@cumin1003: START - Cookbook sre.wikireplicas.add-wiki for database pcmwikiquote (T408354)
  • 13:42 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:42 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 13:36 mfossati@deploy2002: Finished scap sync-world: Backport for Localisation updates from https://translatewiki.net., Style adjustments (T408618), Capture more captions (duration: 23m 05s)
  • 13:36 fnegri@cumin1003: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database minwikisource (T408346)
  • 13:36 fnegri@cumin1003: START - Cookbook sre.wikireplicas.add-wiki for database minwikisource (T408346)
  • 13:34 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on 60 hosts with reason: downtime new nokia devices in case they alert during tests
  • 13:32 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1167 (T407997)', diff saved to https://phabricator.wikimedia.org/P84446 and previous config saved to /var/cache/conftool/dbconfig/20251030-133243-marostegui.json
  • 13:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 13:32 mfossati@deploy2002: mfossati: Continuing with sync
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1001.eqiad.wmnet
  • 13:23 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 13:22 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 13:20 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2028.codfw.wmnet with reason: host reimage
  • 13:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1001.eqiad.wmnet
  • 13:18 mfossati@deploy2002: mfossati: Backport for Localisation updates from https://translatewiki.net., Style adjustments (T408618), Capture more captions synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:16 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es2028.codfw.wmnet with reason: host reimage
  • 13:15 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve2001.codfw.wmnet
  • 13:15 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve2001.codfw.wmnet
  • 13:13 mfossati@deploy2002: Started scap sync-world: Backport for Localisation updates from https://translatewiki.net., Style adjustments (T408618), Capture more captions
  • 12:58 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es2028.codfw.wmnet with OS trixie
  • 12:52 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es2028.codfw.wmnet with OS trixie
  • 12:31 moritzm: installing nginx security updates
  • 12:06 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:06 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:05 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:04 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:04 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:03 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2028.codfw.wmnet with reason: host reimage
  • 12:01 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es2028.codfw.wmnet with reason: host reimage
  • 11:45 fnegri@cumin1003: START - Cookbook sre.wikireplicas.add-wiki for database minwikisource (T408346)
  • 11:45 moritzm: installing pdns-recursor security updates
  • 11:43 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es2028.codfw.wmnet with OS trixie
  • 11:34 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:34 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:25 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:25 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:19 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:18 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 10:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T407997)', diff saved to https://phabricator.wikimedia.org/P84445 and previous config saved to /var/cache/conftool/dbconfig/20251030-103626-marostegui.json
  • 10:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1033.eqiad.wmnet with OS trixie
  • 10:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P84444 and previous config saved to /var/cache/conftool/dbconfig/20251030-102118-marostegui.json
  • 10:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P84443 and previous config saved to /var/cache/conftool/dbconfig/20251030-100611-marostegui.json
  • 09:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T407997)', diff saved to https://phabricator.wikimedia.org/P84442 and previous config saved to /var/cache/conftool/dbconfig/20251030-095103-marostegui.json
  • 09:50 moritzm: import prometheus-statsd-exporter to trixie-wikimedia T407513
  • 09:48 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1231 (T407997)', diff saved to https://phabricator.wikimedia.org/P84441 and previous config saved to /var/cache/conftool/dbconfig/20251030-094854-marostegui.json
  • 09:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 09:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 09:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T407997)', diff saved to https://phabricator.wikimedia.org/P84440 and previous config saved to /var/cache/conftool/dbconfig/20251030-094409-marostegui.json
  • 09:34 eileen: civicrm upgraded from f8802d27 to ed25fa88
  • 09:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P84439 and previous config saved to /var/cache/conftool/dbconfig/20251030-092901-marostegui.json
  • 09:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P84438 and previous config saved to /var/cache/conftool/dbconfig/20251030-091354-marostegui.json
  • 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T407997)', diff saved to https://phabricator.wikimedia.org/P84437 and previous config saved to /var/cache/conftool/dbconfig/20251030-085846-marostegui.json
  • 08:56 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1187 (T407997)', diff saved to https://phabricator.wikimedia.org/P84436 and previous config saved to /var/cache/conftool/dbconfig/20251030-085636-marostegui.json
  • 08:56 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 08:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T407997)', diff saved to https://phabricator.wikimedia.org/P84435 and previous config saved to /var/cache/conftool/dbconfig/20251030-085613-marostegui.json
  • 08:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P84434 and previous config saved to /var/cache/conftool/dbconfig/20251030-084105-marostegui.json
  • 08:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P84433 and previous config saved to /var/cache/conftool/dbconfig/20251030-082558-marostegui.json
  • 08:23 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1033.eqiad.wmnet with reason: host reimage
  • 08:23 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet with reason: Fixing triggers
  • 08:22 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1155.eqiad.wmnet with reason: Upgrade
  • 08:18 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1033.eqiad.wmnet with reason: host reimage
  • 08:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T407997)', diff saved to https://phabricator.wikimedia.org/P84432 and previous config saved to /var/cache/conftool/dbconfig/20251030-081050-marostegui.json
  • 08:10 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 08:08 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1180 (T407997)', diff saved to https://phabricator.wikimedia.org/P84431 and previous config saved to /var/cache/conftool/dbconfig/20251030-080840-marostegui.json
  • 08:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T407997)', diff saved to https://phabricator.wikimedia.org/P84430 and previous config saved to /var/cache/conftool/dbconfig/20251030-080816-marostegui.json
  • 07:54 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1033.eqiad.wmnet with OS trixie
  • 07:53 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1033.eqiad.wmnet with OS trixie
  • 07:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P84429 and previous config saved to /var/cache/conftool/dbconfig/20251030-075308-marostegui.json
  • 07:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P84428 and previous config saved to /var/cache/conftool/dbconfig/20251030-073801-marostegui.json
  • 07:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T407997)', diff saved to https://phabricator.wikimedia.org/P84427 and previous config saved to /var/cache/conftool/dbconfig/20251030-072253-marostegui.json
  • 07:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1173 (T407997)', diff saved to https://phabricator.wikimedia.org/P84426 and previous config saved to /var/cache/conftool/dbconfig/20251030-072043-marostegui.json
  • 07:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T407997)', diff saved to https://phabricator.wikimedia.org/P84425 and previous config saved to /var/cache/conftool/dbconfig/20251030-072020-marostegui.json
  • 07:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P84424 and previous config saved to /var/cache/conftool/dbconfig/20251030-070512-marostegui.json
  • 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1033.eqiad.wmnet with reason: host reimage
  • 06:50 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1033.eqiad.wmnet with reason: host reimage
  • 06:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P84423 and previous config saved to /var/cache/conftool/dbconfig/20251030-065004-marostegui.json
  • 06:42 marostegui@cumin1003: dbctl commit (dc=all): 'db2153 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84422 and previous config saved to /var/cache/conftool/dbconfig/20251030-064250-root.json
  • 06:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T407997)', diff saved to https://phabricator.wikimedia.org/P84421 and previous config saved to /var/cache/conftool/dbconfig/20251030-063457-marostegui.json
  • 06:32 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1168 (T407997)', diff saved to https://phabricator.wikimedia.org/P84420 and previous config saved to /var/cache/conftool/dbconfig/20251030-063247-marostegui.json
  • 06:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 06:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T407997)', diff saved to https://phabricator.wikimedia.org/P84419 and previous config saved to /var/cache/conftool/dbconfig/20251030-063223-marostegui.json
  • 06:27 marostegui@cumin1003: dbctl commit (dc=all): 'db2153 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84418 and previous config saved to /var/cache/conftool/dbconfig/20251030-062744-root.json
  • 06:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P84417 and previous config saved to /var/cache/conftool/dbconfig/20251030-061715-marostegui.json
  • 06:15 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1033.eqiad.wmnet with OS trixie
  • 06:12 marostegui@cumin1003: dbctl commit (dc=all): 'db2153 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84416 and previous config saved to /var/cache/conftool/dbconfig/20251030-061238-root.json
  • 06:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P84415 and previous config saved to /var/cache/conftool/dbconfig/20251030-060208-marostegui.json
  • 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1033 from dbctl T408772', diff saved to https://phabricator.wikimedia.org/P84414 and previous config saved to /var/cache/conftool/dbconfig/20251030-060018-marostegui.json
  • 05:57 marostegui@cumin1003: dbctl commit (dc=all): 'db2153 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84413 and previous config saved to /var/cache/conftool/dbconfig/20251030-055732-root.json
  • 05:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2153 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84412 and previous config saved to /var/cache/conftool/dbconfig/20251030-054923-marostegui.json
  • 05:49 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 05:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T407997)', diff saved to https://phabricator.wikimedia.org/P84411 and previous config saved to /var/cache/conftool/dbconfig/20251030-054659-marostegui.json
  • 05:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1165 (T407997)', diff saved to https://phabricator.wikimedia.org/P84410 and previous config saved to /var/cache/conftool/dbconfig/20251030-054449-marostegui.json
  • 05:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 12s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-29

  • 22:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T407997)', diff saved to https://phabricator.wikimedia.org/P84409 and previous config saved to /var/cache/conftool/dbconfig/20251029-225501-marostegui.json
  • 22:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P84408 and previous config saved to /var/cache/conftool/dbconfig/20251029-223952-marostegui.json
  • 22:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P84407 and previous config saved to /var/cache/conftool/dbconfig/20251029-222445-marostegui.json
  • 22:16 jdlrobson@deploy2002: Finished scap sync-world: Backport for Deploy dark mode everywhere (T395628) (duration: 10m 30s)
  • 22:14 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:12 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:12 jdlrobson@deploy2002: jdlrobson: Continuing with sync
  • 22:11 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:11 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T407997)', diff saved to https://phabricator.wikimedia.org/P84406 and previous config saved to /var/cache/conftool/dbconfig/20251029-220937-marostegui.json
  • 22:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:08 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:08 jdlrobson@deploy2002: jdlrobson: Backport for Deploy dark mode everywhere (T395628) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:08 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:07 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:05 jdlrobson@deploy2002: Started scap sync-world: Backport for Deploy dark mode everywhere (T395628)
  • 22:03 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2229 (T407997)', diff saved to https://phabricator.wikimedia.org/P84405 and previous config saved to /var/cache/conftool/dbconfig/20251029-220341-marostegui.json
  • 22:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 22:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T407997)', diff saved to https://phabricator.wikimedia.org/P84404 and previous config saved to /var/cache/conftool/dbconfig/20251029-220317-marostegui.json
  • 22:03 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:03 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:03 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:02 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:02 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 22:02 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 21:48 dduvall@deploy2002: Finished scap sync-world: Backport for EntitySourceDefinitions: use false as DB name if pointing to current wiki (T408525), recentchanges API result contains wrong entries with redirect: False (T408667) (duration: 08m 15s)
  • 21:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P84403 and previous config saved to /var/cache/conftool/dbconfig/20251029-214808-marostegui.json
  • 21:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 21:47 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 21:45 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 21:44 dduvall@deploy2002: tstarling, dduvall: Continuing with sync
  • 21:44 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 21:43 dduvall@deploy2002: tstarling, dduvall: Backport for EntitySourceDefinitions: use false as DB name if pointing to current wiki (T408525), recentchanges API result contains wrong entries with redirect: False (T408667) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:40 dduvall@deploy2002: Started scap sync-world: Backport for EntitySourceDefinitions: use false as DB name if pointing to current wiki (T408525), recentchanges API result contains wrong entries with redirect: False (T408667)
  • 21:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P84402 and previous config saved to /var/cache/conftool/dbconfig/20251029-213300-marostegui.json
  • 21:22 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy3002.esams.wmnet with OS trixie
  • 21:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T407997)', diff saved to https://phabricator.wikimedia.org/P84401 and previous config saved to /var/cache/conftool/dbconfig/20251029-211752-marostegui.json
  • 21:16 cjming: end of UTC late backport window
  • 21:16 cjming@deploy2002: Finished scap sync-world: Backport for PHP client library: Fixed spelling for `mediawiki_database` (T408717), PHP client library: Fixed spelling for `mediawiki_database` (T408717) (duration: 08m 30s)
  • 21:12 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 21:12 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2224 (T407997)', diff saved to https://phabricator.wikimedia.org/P84400 and previous config saved to /var/cache/conftool/dbconfig/20251029-211153-marostegui.json
  • 21:11 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 21:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 21:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T407997)', diff saved to https://phabricator.wikimedia.org/P84399 and previous config saved to /var/cache/conftool/dbconfig/20251029-211129-marostegui.json
  • 21:10 cjming@deploy2002: cjming, sfaci: Continuing with sync
  • 21:10 cjming@deploy2002: cjming, sfaci: Backport for PHP client library: Fixed spelling for `mediawiki_database` (T408717), PHP client library: Fixed spelling for `mediawiki_database` (T408717) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:07 cjming@deploy2002: Started scap sync-world: Backport for PHP client library: Fixed spelling for `mediawiki_database` (T408717), PHP client library: Fixed spelling for `mediawiki_database` (T408717)
  • 21:05 mutante: adding TLS support to zookeeper as a feature flag - no existing zookeeper server will change
  • 21:04 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy3002.esams.wmnet with reason: host reimage
  • 20:58 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy3002.esams.wmnet with reason: host reimage
  • 20:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 20:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 20:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P84398 and previous config saved to /var/cache/conftool/dbconfig/20251029-205621-marostegui.json
  • 20:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 20:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 20:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P84397 and previous config saved to /var/cache/conftool/dbconfig/20251029-204113-marostegui.json
  • 20:36 arlolra@deploy2002: Finished scap sync-world: Backport for ExtensionDistributor: Mark 1.45 as beta (T408466) (duration: 08m 51s)
  • 20:32 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy3002.esams.wmnet with OS trixie
  • 20:31 arlolra@deploy2002: arlolra: Continuing with sync
  • 20:31 arlolra@deploy2002: arlolra: Backport for ExtensionDistributor: Mark 1.45 as beta (T408466) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:28 arlolra@deploy2002: Started scap sync-world: Backport for ExtensionDistributor: Mark 1.45 as beta (T408466)
  • 20:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T407997)', diff saved to https://phabricator.wikimedia.org/P84395 and previous config saved to /var/cache/conftool/dbconfig/20251029-202605-marostegui.json
  • 20:25 cjming@deploy2002: mwscript-k8s job started: namespaceDupes pcmwikiquote --fix # T408351
  • 20:23 cjming@deploy2002: mwscript-k8s job started: namespaceDupes minwikisource --fix # T408343
  • 20:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2217 (T407997)', diff saved to https://phabricator.wikimedia.org/P84394 and previous config saved to /var/cache/conftool/dbconfig/20251029-201958-marostegui.json
  • 20:19 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 20:17 cjming@deploy2002: Finished scap sync-world: Backport for minwikisource: add portal namespace, set sitename, timezone and project namespace, pcmwikiquote: set timezone, sitename and projectnamespace (T408351), pcmwikiquote: add logos (T408351), minwikisource: add logos (T408343) (duration: 08m 57s)
  • 20:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 20:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T407997)', diff saved to https://phabricator.wikimedia.org/P84393 and previous config saved to /var/cache/conftool/dbconfig/20251029-201406-marostegui.json
  • 20:13 cjming@deploy2002: anzx, cjming: Continuing with sync
  • 20:11 cjming@deploy2002: anzx, cjming: Backport for minwikisource: add portal namespace, set sitename, timezone and project namespace, pcmwikiquote: set timezone, sitename and projectnamespace (T408351), pcmwikiquote: add logos (T408351), minwikisource: add logos (T408343) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug).
  • 20:08 cjming@deploy2002: Started scap sync-world: Backport for minwikisource: add portal namespace, set sitename, timezone and project namespace, pcmwikiquote: set timezone, sitename and projectnamespace (T408351), pcmwikiquote: add logos (T408351), minwikisource: add logos (T408343)
  • 19:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P84392 and previous config saved to /var/cache/conftool/dbconfig/20251029-195855-marostegui.json
  • 19:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P84391 and previous config saved to /var/cache/conftool/dbconfig/20251029-194347-marostegui.json
  • 19:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T407997)', diff saved to https://phabricator.wikimedia.org/P84390 and previous config saved to /var/cache/conftool/dbconfig/20251029-192839-marostegui.json
  • 19:26 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2193 (T407997)', diff saved to https://phabricator.wikimedia.org/P84389 and previous config saved to /var/cache/conftool/dbconfig/20251029-192627-marostegui.json
  • 19:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 19:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T407997)', diff saved to https://phabricator.wikimedia.org/P84388 and previous config saved to /var/cache/conftool/dbconfig/20251029-192603-marostegui.json
  • 19:16 SandraEbele_: Deployed refinery-source
  • 19:12 jasmine: 'homer on multiple lsw1-*-codfw* 'T390859
  • 19:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P84387 and previous config saved to /var/cache/conftool/dbconfig/20251029-191055-marostegui.json
  • 19:10 dancy@deploy2002: Finished scap sync-world: Testing scap 4.22.0 (duration: 03m 30s)
  • 19:09 swfrench-wmf: rolling run-puppet-agent on A:cp hosts for haproxy config change
  • 19:07 dancy@deploy2002: Started scap sync-world: Testing scap 4.22.0
  • 19:02 dancy@deploy2002: Installation of scap version "4.220.0" completed for 2 hosts
  • 19:00 dancy@deploy2002: Installing scap version "4.220.0" for 2 host(s)
  • 18:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P84386 and previous config saved to /var/cache/conftool/dbconfig/20251029-185547-marostegui.json
  • 18:46 swfrench-wmf: disable-puppet on A:cp hosts for haproxy config change
  • 18:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 18:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 18:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T407997)', diff saved to https://phabricator.wikimedia.org/P84385 and previous config saved to /var/cache/conftool/dbconfig/20251029-184039-marostegui.json
  • 18:38 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2180 (T407997)', diff saved to https://phabricator.wikimedia.org/P84384 and previous config saved to /var/cache/conftool/dbconfig/20251029-183827-marostegui.json
  • 18:38 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 18:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T407997)', diff saved to https://phabricator.wikimedia.org/P84383 and previous config saved to /var/cache/conftool/dbconfig/20251029-183802-marostegui.json
  • 18:35 mutante: gitlab1003 systemctl start backup-restore T408705
  • 18:31 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.25 refs T405681
  • 18:26 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 18:25 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 18:23 dduvall: rolling back 1.45.0-wmf.25 from group1 due to spike in `PHP Deprecated: Deprecated cross-wiki access to MediaWiki\Revision\RevisionRecord` errors (T408525) (cc T408525)
  • 18:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P84381 and previous config saved to /var/cache/conftool/dbconfig/20251029-182253-marostegui.json
  • 18:17 krinkle@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 18:17 krinkle@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 18:11 SandraEbele_: deploying refinery source as part of deployment train.
  • 18:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P84380 and previous config saved to /var/cache/conftool/dbconfig/20251029-180746-marostegui.json
  • 17:52 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 17:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T407997)', diff saved to https://phabricator.wikimedia.org/P84379 and previous config saved to /var/cache/conftool/dbconfig/20251029-175238-marostegui.json
  • 17:52 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 17:46 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2169 (T407997)', diff saved to https://phabricator.wikimedia.org/P84378 and previous config saved to /var/cache/conftool/dbconfig/20251029-174616-marostegui.json
  • 17:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 17:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T407997)', diff saved to https://phabricator.wikimedia.org/P84377 and previous config saved to /var/cache/conftool/dbconfig/20251029-174602-marostegui.json
  • 17:42 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 17:42 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 17:42 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:42 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:42 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:42 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:41 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:41 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:41 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:41 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:39 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:39 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:39 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:36 mutante: upgrade envoy on phab2002, vrts2002, contint2002 T405808
  • 17:31 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:31 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P84376 and previous config saved to /var/cache/conftool/dbconfig/20251029-173055-marostegui.json
  • 17:29 mutante: upgrade envoy on phab2002
  • 17:25 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:24 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:23 swfrench@deploy2002: Finished scap sync-world: Backport for Enroll 25% of client sessions in PHP 8.3 (T405955) (duration: 10m 08s)
  • 17:19 swfrench@deploy2002: swfrench: Continuing with sync
  • 17:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P84375 and previous config saved to /var/cache/conftool/dbconfig/20251029-171547-marostegui.json
  • 17:15 swfrench@deploy2002: swfrench: Backport for Enroll 25% of client sessions in PHP 8.3 (T405955) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:13 swfrench@deploy2002: Started scap sync-world: Backport for Enroll 25% of client sessions in PHP 8.3 (T405955)
  • 17:08 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:08 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:07 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:07 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:05 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:05 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:04 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 17:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T407997)', diff saved to https://phabricator.wikimedia.org/P84374 and previous config saved to /var/cache/conftool/dbconfig/20251029-170039-marostegui.json
  • 16:50 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2158 (T407997)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20251029-165010-marostegui.json
  • 16:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 16:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T407997)', diff saved to https://phabricator.wikimedia.org/P84372 and previous config saved to /var/cache/conftool/dbconfig/20251029-164954-marostegui.json
  • 16:39 fceratto@cumin1003: dbctl commit (dc=all): 'Remove es2034 from dbctl T408414', diff saved to https://phabricator.wikimedia.org/P84371 and previous config saved to /var/cache/conftool/dbconfig/20251029-163859-fceratto.json
  • 16:35 mutante: welcome new deployer Sean Leong - https://meta.wikimedia.org/wiki/User:Sean_Leong_(WMDE) T406592
  • 16:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to and previous config saved to /var/cache/conftool/dbconfig/20251029-163446-marostegui.json
  • 16:30 fceratto@cumin1003: dbctl commit (dc=all): 'Remove es2033 from dbctl T408412', diff saved to https://phabricator.wikimedia.org/P84369 and previous config saved to /var/cache/conftool/dbconfig/20251029-163021-fceratto.json
  • 16:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy2001.codfw.wmnet
  • 16:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy2001.codfw.wmnet with OS trixie
  • 16:27 fceratto@cumin1003: dbctl commit (dc=all): 'Remove es2032 from dbctl T408411', diff saved to https://phabricator.wikimedia.org/P84368 and previous config saved to /var/cache/conftool/dbconfig/20251029-162711-fceratto.json
  • 16:20 sergi0: `sgimeno@deploy2002:~$ mwscript-k8s --comment="T407366" --dblist="growthexperiments" --follow -- GrowthExperiments:purgeExpiredMentorStatus.php` (T407366)
  • 16:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P84367 and previous config saved to /var/cache/conftool/dbconfig/20251029-161938-marostegui.json
  • 16:19 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on people2003.codfw.wmnet with reason: decom
  • 16:18 mutante: shutting down people1004.eqiad.wmnet, people2003.codfw.wmnet - T408713 T402596
  • 16:18 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on people1004.eqiad.wmnet with reason: decom
  • 16:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy2001.codfw.wmnet with reason: host reimage
  • 16:11 mutante: upgrade Envoy on etherpad* T405808
  • 16:10 mutante: upgrade Envoy on stewards* T405808
  • 16:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy2001.codfw.wmnet with reason: host reimage
  • 16:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T407997)', diff saved to https://phabricator.wikimedia.org/P84366 and previous config saved to /var/cache/conftool/dbconfig/20251029-160430-marostegui.json
  • 16:02 mutante: upgrade Envoy on planet* T405808
  • 15:56 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2151 (T407997)', diff saved to https://phabricator.wikimedia.org/P84365 and previous config saved to /var/cache/conftool/dbconfig/20251029-155605-marostegui.json
  • 15:55 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 15:55 mutante: upgrade Envoy on doc* T405808
  • 15:55 marostegui@cumin1003: dbctl commit (dc=all): 'Pool db2169 with full weight', diff saved to https://phabricator.wikimedia.org/P84364 and previous config saved to /var/cache/conftool/dbconfig/20251029-155520-marostegui.json
  • 15:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:52 dancy@deploy2002: Installation of scap version "4.219.0" completed for 165 hosts
  • 15:50 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2012.codfw.wmnet
  • 15:50 mutante: upgrade Envoy on zuul* T405808
  • 15:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy2001.codfw.wmnet with OS trixie
  • 15:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy2001.codfw.wmnet - jmm@cumin2002"
  • 15:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy2001.codfw.wmnet - jmm@cumin2002"
  • 15:48 dancy@deploy2002: Installing scap version "4.219.0" for 165 host(s)
  • 15:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 15:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - jmm@cumin2002"
  • 15:47 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs2012.codfw.wmnet
  • 15:47 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2012.codfw.wmnet
  • 15:47 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2012.codfw.wmnet
  • 15:47 mutante: upgrade Envoy on releases* T405808
  • 15:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T407997)', diff saved to https://phabricator.wikimedia.org/P84363 and previous config saved to /var/cache/conftool/dbconfig/20251029-154659-marostegui.json
  • 15:45 mutante: upgrade Envoy on people* T405808
  • 15:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - jmm@cumin2002"
  • 15:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2169 (T407997)', diff saved to https://phabricator.wikimedia.org/P84362 and previous config saved to /var/cache/conftool/dbconfig/20251029-154448-marostegui.json
  • 15:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 15:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T407997)', diff saved to https://phabricator.wikimedia.org/P84361 and previous config saved to /var/cache/conftool/dbconfig/20251029-154424-marostegui.json
  • 15:41 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:41 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy2001.codfw.wmnet
  • 15:38 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: T407110
  • 15:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2012.codfw.wmnet
  • 15:36 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2012.codfw.wmnet
  • 15:33 fabfur: reboot lvs2012 (T407110)
  • 15:30 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2014.codfw.wmnet
  • 15:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P84360 and previous config saved to /var/cache/conftool/dbconfig/20251029-152916-marostegui.json
  • 15:28 _joe_: restarted mailman3-web on lists1004
  • 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Remove es2028 from dbctl T408407', diff saved to https://phabricator.wikimedia.org/P84359 and previous config saved to /var/cache/conftool/dbconfig/20251029-152440-fceratto.json
  • 15:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P84358 and previous config saved to /var/cache/conftool/dbconfig/20251029-151409-marostegui.json
  • 15:13 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2034 - Depool es2034 T408414
  • 15:13 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2034 - Depool es2034 T408414
  • 15:13 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2033 - Depool es2033 T408412
  • 15:12 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2033 - Depool es2033 T408412
  • 15:12 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2032 - Depool es2032 T408411
  • 15:12 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2032 - Depool es2032 T408411
  • 15:11 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2028 - Depool es2028 T408407
  • 15:11 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2028 - Depool es2028 T408407
  • 15:10 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) es2029 - Depool es2029 T408408
  • 15:10 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2029 - Depool es2029 T408408
  • 15:09 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2028 - Depool es2028 T408407
  • 15:08 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2028 - Depool es2028 T408407
  • 15:06 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs2014.codfw.wmnet
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts tcp-proxy2001.codfw.wmnet
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:04 fabfur: reboot lvs2014 (T407110)
  • 15:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 15:03 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:03 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 14:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T407997)', diff saved to https://phabricator.wikimedia.org/P84353 and previous config saved to /var/cache/conftool/dbconfig/20251029-145901-marostegui.json
  • 14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 14:54 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2158 (T407997)', diff saved to https://phabricator.wikimedia.org/P84352 and previous config saved to /var/cache/conftool/dbconfig/20251029-145450-marostegui.json
  • 14:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T407997)', diff saved to https://phabricator.wikimedia.org/P84351 and previous config saved to /var/cache/conftool/dbconfig/20251029-145425-marostegui.json
  • 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 14:44 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts tcp-proxy2001.codfw.wmnet
  • 14:40 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host ml-serve2001
  • 14:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P84350 and previous config saved to /var/cache/conftool/dbconfig/20251029-143918-marostegui.json
  • 14:37 elukey@cumin2002: START - Cookbook sre.hosts.powercycle for host ml-serve2001
  • 14:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P84349 and previous config saved to /var/cache/conftool/dbconfig/20251029-142410-marostegui.json
  • 14:09 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 14:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T407997)', diff saved to https://phabricator.wikimedia.org/P84348 and previous config saved to /var/cache/conftool/dbconfig/20251029-140902-marostegui.json
  • 14:09 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 14:06 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2151 (T407997)', diff saved to https://phabricator.wikimedia.org/P84347 and previous config saved to /var/cache/conftool/dbconfig/20251029-140652-marostegui.json
  • 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Depool es2027 T408406', diff saved to https://phabricator.wikimedia.org/P84346 and previous config saved to /var/cache/conftool/dbconfig/20251029-140641-fceratto.json
  • 14:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:56 stevemunene@cumin1003: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 13:43 gehel: deploying envoy 1.32.12-1 + restart on W[CD]QS nodes - T404867
  • 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy3002.esams.wmnet with OS trixie
  • 13:31 kharlan@deploy2002: Finished scap sync-world: Backport for product_metrics/suggested_investigations_interaction: add performer_groups (T404177) (duration: 14m 48s)
  • 13:31 moritzm: upgrade Envoy on debmonitor* T405808
  • 13:31 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 13:30 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 13:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 13:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 13:29 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 13:28 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 13:27 kharlan@deploy2002: kharlan: Continuing with sync
  • 13:26 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy3002.esams.wmnet with reason: host reimage
  • 13:23 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 13:23 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 13:22 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 13:19 kharlan@deploy2002: kharlan: Backport for product_metrics/suggested_investigations_interaction: add performer_groups (T404177) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy3002.esams.wmnet with reason: host reimage
  • 13:17 kharlan@deploy2002: Started scap sync-world: Backport for product_metrics/suggested_investigations_interaction: add performer_groups (T404177)
  • 13:14 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:14 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:07 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:07 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:05 stevemunene@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 13:04 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:04 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:03 stevemunene@cumin1003: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 13:03 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy3002.esams.wmnet with OS trixie
  • 12:50 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:50 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:46 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:46 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy7001.magru.wmnet with OS trixie
  • 12:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:43 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:43 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:33 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:33 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:32 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:32 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:31 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:30 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy7001.magru.wmnet with reason: host reimage
  • 12:26 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy7001.magru.wmnet with reason: host reimage
  • 12:20 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:19 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:19 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:17 stevemunene@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:14 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:14 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:08 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:08 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:08 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:08 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:05 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:04 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:01 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:01 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:57 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:57 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:53 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:53 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:53 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:53 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy7001.magru.wmnet with OS trixie
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy7002.magru.wmnet with OS trixie
  • 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy7002.magru.wmnet with reason: host reimage
  • 11:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy7002.magru.wmnet with reason: host reimage
  • 11:29 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:28 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:25 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:25 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:17 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:16 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:15 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:15 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:10 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:10 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:08 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:08 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:07 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:07 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy7002.magru.wmnet with OS trixie
  • 11:00 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:55 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:55 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:53 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:52 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2004-dev.codfw.wmnet with OS trixie
  • 10:38 pmiazga@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:38 pmiazga@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:33 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 10:33 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 10:33 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 10:32 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 10:08 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 09:59 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host tcp-proxy7001.magru.wmnet with OS trixie
  • 09:55 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host tcp-proxy7001.magru.wmnet with OS trixie
  • 09:54 ozge@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 09:52 joelyrookewmde@deploy2002: mwscript-k8s job started: foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https # Add wikidata support ticket for minwikisource T408347 and pcmwikiquote T408355
  • 09:52 ozge@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 09:51 ozge@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 09:49 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Fix usage of wmgEmergencyCaptcha in closure (T405586) (duration: 11m 51s)
  • 09:48 ozge@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 09:47 ozge@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:46 ozge@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 09:45 kharlan@deploy2002: kharlan: Continuing with sync
  • 09:42 kharlan@deploy2002: kharlan: Backport for hCaptcha: Fix usage of wmgEmergencyCaptcha in closure (T405586) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:37 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Fix usage of wmgEmergencyCaptcha in closure (T405586)
  • 09:33 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Enable hCaptcha for form edits on testwiki (T405586) (duration: 11m 41s)
  • 09:29 kharlan@deploy2002: kharlan: Continuing with sync
  • 09:24 kharlan@deploy2002: kharlan: Backport for hCaptcha: Enable hCaptcha for form edits on testwiki (T405586) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:22 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Enable hCaptcha for form edits on testwiki (T405586)
  • 09:19 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:17 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:03 kharlan@deploy2002: kharlan: Continuing with sync
  • 09:01 kharlan@deploy2002: kharlan: Backport for SI: Use minimalist keys to reduce action_context size (T408546), SI: Use minimalist keys to reduce action_context size (T408546) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:59 kharlan@deploy2002: Started scap sync-world: Backport for SI: Use minimalist keys to reduce action_context size (T408546), SI: Use minimalist keys to reduce action_context size (T408546)
  • 08:49 marostegui@cumin1003: dbctl commit (dc=all): 'sretest2003 (re)pooling @ 100%: Pooling for the first time in es7', diff saved to https://phabricator.wikimedia.org/P84341 and previous config saved to /var/cache/conftool/dbconfig/20251029-084929-root.json
  • 08:44 moritzm: installing Jetty security updates
  • 08:44 kharlan@deploy2002: Finished scap sync-world: Backport for Metrics Platform PHP client library: performer_registration_dt won't be added to the user when the user is anon (T408547), Metrics Platform PHP client library: performer_registration_dt won't be added to the user when the user is anon (T408547) (duration: 08m 47s)
  • 08:40 kharlan@deploy2002: sfaci, kharlan: Continuing with sync
  • 08:38 kharlan@deploy2002: sfaci, kharlan: Backport for Metrics Platform PHP client library: performer_registration_dt won't be added to the user when the user is anon (T408547), Metrics Platform PHP client library: performer_registration_dt won't be added to the user when the user is anon (T408547) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can
  • 08:35 kharlan@deploy2002: Started scap sync-world: Backport for Metrics Platform PHP client library: performer_registration_dt won't be added to the user when the user is anon (T408547), Metrics Platform PHP client library: performer_registration_dt won't be added to the user when the user is anon (T408547)
  • 08:34 marostegui@cumin1003: dbctl commit (dc=all): 'sretest2003 (re)pooling @ 75%: Pooling for the first time in es7', diff saved to https://phabricator.wikimedia.org/P84340 and previous config saved to /var/cache/conftool/dbconfig/20251029-083423-root.json
  • 08:19 marostegui@cumin1003: dbctl commit (dc=all): 'sretest2003 (re)pooling @ 60%: Pooling for the first time in es7', diff saved to https://phabricator.wikimedia.org/P84339 and previous config saved to /var/cache/conftool/dbconfig/20251029-081914-root.json
  • 08:04 marostegui@cumin1003: dbctl commit (dc=all): 'sretest2003 (re)pooling @ 50%: Pooling for the first time in es7', diff saved to https://phabricator.wikimedia.org/P84338 and previous config saved to /var/cache/conftool/dbconfig/20251029-080408-root.json
  • 07:50 moritzm: upgrading Java on puppet servers
  • 07:49 marostegui@cumin1003: dbctl commit (dc=all): 'sretest2003 (re)pooling @ 30%: Pooling for the first time in es7', diff saved to https://phabricator.wikimedia.org/P84337 and previous config saved to /var/cache/conftool/dbconfig/20251029-074902-root.json
  • 07:38 krinkle@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 07:37 krinkle@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 07:33 marostegui@cumin1003: dbctl commit (dc=all): 'sretest2003 (re)pooling @ 25%: Pooling for the first time in es7', diff saved to https://phabricator.wikimedia.org/P84336 and previous config saved to /var/cache/conftool/dbconfig/20251029-073354-root.json
  • 07:33 krinkle@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 07:18 marostegui@cumin1003: dbctl commit (dc=all): 'sretest2003 (re)pooling @ 20%: Pooling for the first time in es7', diff saved to https://phabricator.wikimedia.org/P84335 and previous config saved to /var/cache/conftool/dbconfig/20251029-071848-root.json
  • 07:08 marostegui@cumin1003: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84334 and previous config saved to /var/cache/conftool/dbconfig/20251029-070838-root.json
  • 07:03 marostegui@cumin1003: dbctl commit (dc=all): 'sretest2003 (re)pooling @ 10%: Pooling for the first time in es7', diff saved to https://phabricator.wikimedia.org/P84333 and previous config saved to /var/cache/conftool/dbconfig/20251029-070342-root.json
  • 06:53 marostegui@cumin1003: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84332 and previous config saved to /var/cache/conftool/dbconfig/20251029-065330-root.json
  • 06:48 marostegui@cumin1003: dbctl commit (dc=all): 'sretest2003 (re)pooling @ 7%: Pooling for the first time in es7', diff saved to https://phabricator.wikimedia.org/P84331 and previous config saved to /var/cache/conftool/dbconfig/20251029-064835-root.json
  • 06:38 marostegui@cumin1003: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84330 and previous config saved to /var/cache/conftool/dbconfig/20251029-063823-root.json
  • 06:33 marostegui@cumin1003: dbctl commit (dc=all): 'sretest2003 (re)pooling @ 5%: Pooling for the first time in es7', diff saved to https://phabricator.wikimedia.org/P84329 and previous config saved to /var/cache/conftool/dbconfig/20251029-063329-root.json
  • 06:23 marostegui@cumin1003: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84328 and previous config saved to /var/cache/conftool/dbconfig/20251029-062317-root.json
  • 06:18 marostegui@cumin1003: dbctl commit (dc=all): 'sretest2003 (re)pooling @ 1%: Pooling for the first time in es7', diff saved to https://phabricator.wikimedia.org/P84327 and previous config saved to /var/cache/conftool/dbconfig/20251029-061823-root.json
  • 06:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1163.eqiad.wmnet with reason: Upgrade
  • 06:14 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.upgrade (exit_code=97) for db1163.eqiad.wmnet
  • 06:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1163 - Upgrading db1163.eqiad.wmnet
  • 06:06 marostegui@cumin1003: START - Cookbook sre.mysql.depool db1163 - Upgrading db1163.eqiad.wmnet
  • 06:05 marostegui@cumin1003: START - Cookbook sre.mysql.upgrade for db1163.eqiad.wmnet
  • 06:05 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of es2040.codfw.wmnet onto sretest2003.codfw.wmnet
  • 06:05 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2040 gradually with 4 steps - Pool es2040.codfw.wmnet in after cloning
  • 06:03 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1163 T407975', diff saved to https://phabricator.wikimedia.org/P84325 and previous config saved to /var/cache/conftool/dbconfig/20251029-060356-marostegui.json
  • 06:03 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1184 to s1 primary T407975', diff saved to https://phabricator.wikimedia.org/P84324 and previous config saved to /var/cache/conftool/dbconfig/20251029-060314-marostegui.json
  • 06:02 marostegui: Starting s1 eqiad failover from db1163 to db1184 - T407975
  • 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es1032.eqiad.wmnet
  • 06:01 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:01 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1032.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 06:01 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1032.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 05:58 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 05:58 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1184 with weight 0 T407975', diff saved to https://phabricator.wikimedia.org/P84323 and previous config saved to /var/cache/conftool/dbconfig/20251029-055813-marostegui.json
  • 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T407975
  • 05:52 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts es1032.eqiad.wmnet
  • 05:40 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1032 from dbctl T408662', diff saved to https://phabricator.wikimedia.org/P84321 and previous config saved to /var/cache/conftool/dbconfig/20251029-054019-marostegui.json
  • 05:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool es2040 gradually with 4 steps - Pool es2040.codfw.wmnet in after cloning
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 15s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 00:23 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 00:22 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 00:22 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 00:21 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 00:20 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 00:20 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 00:16 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 00:14 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 00:14 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 00:13 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 00:13 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 00:12 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply

2025-10-28

  • 23:46 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:44 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 23:40 rzl@deploy2002: Finished scap sync-world: https://gerrit.wikimedia.org/r/1199519 T405808 (duration: 03m 34s)
  • 23:38 rzl@deploy2002: Started scap sync-world: https://gerrit.wikimedia.org/r/1199519 T405808
  • 23:37 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host tcp-proxy3002.esams.wmnet with OS trixie
  • 23:35 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 23:35 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 23:33 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 23:32 rzl@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 23:26 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host tcp-proxy7001.magru.wmnet with OS trixie
  • 23:26 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy2002.codfw.wmnet
  • 23:26 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy2002.codfw.wmnet with OS trixie
  • 23:09 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy2002.codfw.wmnet with reason: host reimage
  • 23:03 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy2002.codfw.wmnet with reason: host reimage
  • 22:43 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy3002.esams.wmnet with OS trixie
  • 22:42 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy2002.codfw.wmnet with OS trixie
  • 22:42 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy2002.codfw.wmnet - dzahn@cumin2002"
  • 22:42 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy2002.codfw.wmnet - dzahn@cumin2002"
  • 22:42 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 22:42 rzl@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 22:42 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2002.codfw.wmnet on all recursors
  • 22:42 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2002.codfw.wmnet on all recursors
  • 22:41 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:41 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2002.codfw.wmnet - dzahn@cumin2002"
  • 22:41 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2002.codfw.wmnet - dzahn@cumin2002"
  • 22:39 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 22:38 rzl@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 22:37 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 22:37 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy2002.codfw.wmnet
  • 22:33 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy7001.magru.wmnet with OS trixie
  • 21:28 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 21:28 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 21:01 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy7002.magru.wmnet
  • 21:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy7002.magru.wmnet with OS trixie
  • 20:44 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy7002.magru.wmnet with reason: host reimage
  • 20:38 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy7002.magru.wmnet with reason: host reimage
  • 20:29 Msz2001: Deployed change to private Suggested Investigations code
  • 20:25 dzahn@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy7001.magru.wmnet
  • 20:25 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host tcp-proxy7001.magru.wmnet with OS trixie
  • 20:12 mszwarc@deploy2002: Finished scap sync-world: Backport for hCaptcha: Store risk score in cache, so that jobs can use it (T408542), hCaptcha: Store risk score in cache, so that jobs can use it (T408542) (duration: 07m 27s)
  • 20:08 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS trixie
  • 20:07 mszwarc@deploy2002: mszwarc: Continuing with sync
  • 20:07 mszwarc@deploy2002: mszwarc: Backport for hCaptcha: Store risk score in cache, so that jobs can use it (T408542), hCaptcha: Store risk score in cache, so that jobs can use it (T408542) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:04 mszwarc@deploy2002: Started scap sync-world: Backport for hCaptcha: Store risk score in cache, so that jobs can use it (T408542), hCaptcha: Store risk score in cache, so that jobs can use it (T408542)
  • 20:01 brett@dns1004: END - running authdns-update
  • 20:00 brett@dns1004: START - running authdns-update
  • 19:58 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy7002.magru.wmnet with OS trixie
  • 19:58 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy7002.magru.wmnet - dzahn@cumin2002"
  • 19:58 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy7002.magru.wmnet - dzahn@cumin2002"
  • 19:57 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy7002.magru.wmnet on all recursors
  • 19:57 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy7002.magru.wmnet on all recursors
  • 19:57 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy7002.magru.wmnet - dzahn@cumin2002"
  • 19:57 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy7002.magru.wmnet - dzahn@cumin2002"
  • 19:51 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:51 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy7002.magru.wmnet
  • 19:49 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
  • 19:45 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
  • 19:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 19:40 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 19:39 jhathaway@dns1004: END - running authdns-update
  • 19:37 jhathaway@dns1004: START - running authdns-update
  • 19:31 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy7001.magru.wmnet with OS trixie
  • 19:31 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy7001.magru.wmnet - dzahn@cumin2002"
  • 19:31 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy7001.magru.wmnet - dzahn@cumin2002"
  • 19:30 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy7001.magru.wmnet on all recursors
  • 19:30 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy7001.magru.wmnet on all recursors
  • 19:30 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:30 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy7001.magru.wmnet - dzahn@cumin2002"
  • 19:30 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy7001.magru.wmnet - dzahn@cumin2002"
  • 19:28 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices2004-dev.codfw.wmnet with OS trixie
  • 19:26 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:26 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy7001.magru.wmnet
  • 19:24 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS trixie
  • 19:23 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2005-dev.codfw.wmnet with OS trixie
  • 19:00 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
  • 18:51 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
  • 18:43 cstone: civicrm upgraded from 3819e60c to f8802d27
  • 18:42 ejegg: donorwiki upgraded from ea963482 to 09caf170
  • 18:40 ejegg: payments-wiki upgraded from 5f72d7b3 to 09caf170
  • 18:35 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2005-dev.codfw.wmnet with OS trixie
  • 18:14 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.25 refs T405681
  • 18:03 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 18:02 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 18:02 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 18:02 otto@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 18:01 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 18:01 otto@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 17:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depool es2027 T408406', diff saved to https://phabricator.wikimedia.org/P84318 and previous config saved to /var/cache/conftool/dbconfig/20251028-173348-fceratto.json
  • 17:29 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:28 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:28 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:28 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:28 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:27 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:27 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:27 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:26 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:26 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:25 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:25 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:23 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:12 marostegui@cumin1003: START - Cookbook sre.mysql.clone of es2040.codfw.wmnet onto sretest2003.codfw.wmnet
  • 17:11 swfrench@deploy2002: Finished scap sync-world: Backport for Enroll 10% of client sessions in PHP 8.3 (T405955) (duration: 08m 30s)
  • 17:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es2040.codfw.wmnet,sretest2003.codfw.wmnet with reason: Cloning sretest2003 from es2040
  • 17:10 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es2040 to clone sretest2003 T407352', diff saved to https://phabricator.wikimedia.org/P84317 and previous config saved to /var/cache/conftool/dbconfig/20251028-170958-marostegui.json
  • 17:08 marostegui@cumin1003: dbctl commit (dc=all): 'Depool sretest2003 T407352', diff saved to https://phabricator.wikimedia.org/P84316 and previous config saved to /var/cache/conftool/dbconfig/20251028-170840-marostegui.json
  • 17:07 swfrench@deploy2002: swfrench: Continuing with sync
  • 17:05 swfrench@deploy2002: swfrench: Backport for Enroll 10% of client sessions in PHP 8.3 (T405955) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:02 swfrench@deploy2002: Started scap sync-world: Backport for Enroll 10% of client sessions in PHP 8.3 (T405955)
  • 16:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 16:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 16:52 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 16:51 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 16:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 16:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 16:51 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 16:51 otto@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 16:44 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es1031.eqiad.wmnet
  • 16:44 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:44 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1031.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 16:44 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1031.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 16:41 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 16:41 otto@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 16:40 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 16:38 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 16:38 otto@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 16:34 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts es1031.eqiad.wmnet
  • 16:32 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1031 from dbctl T408600', diff saved to https://phabricator.wikimedia.org/P84315 and previous config saved to /var/cache/conftool/dbconfig/20251028-163252-marostegui.json
  • 15:44 jhancock@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ml-serve2001']
  • 15:43 swfrench-wmf: rolling run-puppet-agent on A:cp hosts for haproxy config change
  • 15:41 otto@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:34 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ml-serve2001']
  • 15:23 swfrench-wmf: disable-puppet on A:cp hosts for haproxy config change
  • 15:16 brennen@deploy2002: Finished deploy [phabricator/deployment@5fbb350]: deploy phab1004 for T408575 (duration: 06m 09s)
  • 15:11 swfrench-wmf: applied mediawiki-common network policy updates in mw-script / mw-cron - T309738
  • 15:10 brennen@deploy2002: Started deploy [phabricator/deployment@5fbb350]: deploy phab1004 for T408575
  • 15:09 brennen@deploy2002: Finished deploy [phabricator/deployment@5fbb350]: deploy phab1004 for T408575 (duration: 00m 34s)
  • 15:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 15:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 15:09 brennen@deploy2002: Started deploy [phabricator/deployment@5fbb350]: deploy phab1004 for T408575
  • 15:09 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 15:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 15:06 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab2002.codfw.wmnet with reason: reboot for kernel
  • 15:05 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab1004.eqiad.wmnet with reason: reboot for kernel
  • 14:59 dancy@deploy2002: Installation of scap version "4.218.0" completed for 2 hosts
  • 14:57 dancy@deploy2002: Installing scap version "4.218.0" for 2 host(s)
  • 14:52 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:45 hashar: Restarted CI Jenkins
  • 14:42 hashar: Restarting Gerrit
  • 13:43 derick@deploy2002: Finished scap sync-world: Backport for Remove hCaptcha site key from private/readme.php (duration: 08m 58s)
  • 13:39 derick@deploy2002: mszwarc, derick: Continuing with sync
  • 13:38 derick@deploy2002: mszwarc, derick: Backport for Remove hCaptcha site key from private/readme.php synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:34 derick@deploy2002: Started scap sync-world: Backport for Remove hCaptcha site key from private/readme.php
  • 13:32 derick@deploy2002: Finished scap sync-world: Backport for Make wgVectorMaxWidthOptions specify Special:Userlogin correctly (T408447) (duration: 10m 56s)
  • 13:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 13:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 13:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 13:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 13:26 derick@deploy2002: derick, matmarex: Continuing with sync
  • 13:25 derick@deploy2002: derick, matmarex: Backport for Make wgVectorMaxWidthOptions specify Special:Userlogin correctly (T408447) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:21 derick@deploy2002: Started scap sync-world: Backport for Make wgVectorMaxWidthOptions specify Special:Userlogin correctly (T408447)
  • 13:17 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host lvs2011.codfw.wmnet
  • 13:14 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:14 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:06 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host lvs2011.codfw.wmnet
  • 13:00 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on A:swift-fe-codfw
  • 12:53 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es2026.codfw.wmnet
  • 12:53 fceratto@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:53 fceratto@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
  • 12:49 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
  • 12:46 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
  • 12:45 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2011.codfw.wmnet with reason: reboot
  • 12:24 fceratto@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - fceratto@cumin1003"
  • 12:04 Msz2001: Deployed changes to Suggested Investigations
  • 11:48 fceratto@cumin1003: START - Cookbook sre.dns.netbox
  • 11:44 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on A:swift-fe-codfw
  • 11:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010
  • 11:42 fceratto@cumin1003: START - Cookbook sre.hosts.decommission for hosts es2026.codfw.wmnet
  • 11:41 elukey@cumin2002: START - Cookbook sre.hosts.powercycle for host sretest2010
  • 11:40 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host ml-serve2001
  • 11:30 elukey@cumin2002: START - Cookbook sre.hosts.powercycle for host ml-serve2001
  • 11:00 zabe@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 10:58 zabe@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 10:50 moritzm: installing openjdk-17 security updates
  • 10:10 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Roll-restart for Java security updates - klausman@cumin1003
  • 09:55 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
  • 09:52 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Roll-restart for Java security updates - klausman@cumin1003
  • 09:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Roll-restart for Java security updates - klausman@cumin1003
  • 09:49 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
  • 09:48 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1014.eqiad.wmnet
  • 09:42 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host rdb1014.eqiad.wmnet
  • 09:42 cgoubert@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:39 cgoubert@cumin1003: START - Cookbook sre.dns.netbox
  • 09:39 cgoubert@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:36 cgoubert@cumin1003: START - Cookbook sre.dns.netbox
  • 09:34 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Roll-restart for Java security updates - klausman@cumin1003
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: OpenJDK security updates - jmm@cumin2002
  • 09:08 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Enable on loginwiki (T408428) (duration: 16m 35s)
  • 09:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:59 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: OpenJDK security updates - jmm@cumin2002
  • 08:58 kharlan@deploy2002: kharlan: Continuing with sync
  • 08:56 kharlan@deploy2002: kharlan: Backport for hCaptcha: Enable on loginwiki (T408428) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:53 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host ml-serve2001
  • 08:53 elukey@cumin2002: START - Cookbook sre.hosts.powercycle for host ml-serve2001
  • 08:52 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Enable on loginwiki (T408428)
  • 08:49 kharlan@deploy2002: Finished scap sync-world: Backport for CheckUser: Enable SI on metawiki and loginwiki (T408428) (duration: 46m 57s)
  • 08:33 kharlan@deploy2002: kharlan: Continuing with sync
  • 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve2001.codfw.wmnet
  • 08:29 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve2001.codfw.wmnet
  • 08:28 moritzm: installing openjdk-11 security updates
  • 08:28 kharlan@deploy2002: kharlan: Backport for CheckUser: Enable SI on metawiki and loginwiki (T408428) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:18 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.powercycle (exit_code=99) for host ml-serve2001
  • 08:18 elukey@cumin2002: START - Cookbook sre.hosts.powercycle for host ml-serve2001
  • 08:13 gehel: restarting blazegraph on wdqs1019 - free allocator decreasing - `sudo depool; sleep 30; sudo systemctl restart wdqs-blazegraph.service; sleep 30; sudo pool`
  • 08:11 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.powercycle (exit_code=99) for host ml-serve2001
  • 08:11 elukey@cumin2002: START - Cookbook sre.hosts.powercycle for host ml-serve2001
  • 08:05 jmm@dns1004: END - running authdns-update
  • 08:04 jmm@dns1004: START - running authdns-update
  • 08:02 kharlan@deploy2002: Started scap sync-world: Backport for CheckUser: Enable SI on metawiki and loginwiki (T408428)
  • 07:43 marostegui: Deploy schema change on the master x1 T407587
  • 07:10 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis minwikisource in section s5
  • 06:54 marostegui@cumin1003: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis minwikisource in section s5
  • 06:53 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis pcmwikiquote in section s5
  • 06:44 marostegui@cumin1003: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis pcmwikiquote in section s5
  • 04:02 mwpresync@deploy2002: Pruned MediaWiki: 1.45.0-wmf.22 (duration: 02m 38s)
  • 03:51 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.45.0-wmf.25 refs T405681 (duration: 47m 50s)
  • 03:04 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.45.0-wmf.25 refs T405681
  • 01:32 zabe@deploy2002: Finished scap sync-world: Backport for Activate pcmwikisource (T408318), Activate minwikisource (T408317), Update interwiki cache (duration: 18m 07s)
  • 01:22 zabe@deploy2002: zabe: Continuing with sync
  • 01:18 zabe@deploy2002: zabe: Backport for Activate pcmwikisource (T408318), Activate minwikisource (T408317), Update interwiki cache synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:14 zabe@deploy2002: Started scap sync-world: Backport for Activate pcmwikisource (T408318), Activate minwikisource (T408317), Update interwiki cache
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 19s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 00:57 zabe@deploy2002: Finished scap sync-world: Backport for Initial configuration for pcmwikiqoute (T408318), Initial configuration for minwikisource (T408317) (duration: 40m 37s)
  • 00:43 zabe@deploy2002: zabe: Continuing with sync
  • 00:42 zabe@deploy2002: zabe: Backport for Initial configuration for pcmwikiqoute (T408318), Initial configuration for minwikisource (T408317) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:16 zabe@deploy2002: Started scap sync-world: Backport for Initial configuration for pcmwikiqoute (T408318), Initial configuration for minwikisource (T408317)
  • 00:06 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host tcp-proxy3002.esams.wmnet with OS trixie

2025-10-27

  • 23:18 mutante: ganeti3005 - sudo ssh-keygen -f "/var/lib/ganeti/known_hosts" -R "ganeti03.svc.esams.wmnet" - revoking offending RSA key
  • 23:12 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy3002.esams.wmnet with OS trixie
  • 23:05 dzahn@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy3002.esams.wmnet
  • 23:05 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host tcp-proxy3002.esams.wmnet with OS trixie
  • 23:03 rzl: rzl@apt1002:~$ sudo -i reprepro copy trixie-wikimedia bullseye-wikimedia envoyproxy # T405808
  • 23:03 rzl: rzl@apt1002:~$ sudo -i reprepro copy bookworm-wikimedia bullseye-wikimedia envoyproxy # T405808
  • 23:03 rzl: rzl@apt1002:~$ sudo -i reprepro -C main includedeb bullseye-wikimedia /srv/wikimedia/pool/component/envoy-future/e/envoyproxy/envoyproxy_1.32.12-1_amd64.deb # T405808
  • 22:48 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 22:48 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 22:46 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 22:46 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy3001.esams.wmnet
  • 22:46 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy3001.esams.wmnet with OS trixie
  • 22:46 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 22:44 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 22:43 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 22:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy3001.esams.wmnet with reason: host reimage
  • 22:23 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy3001.esams.wmnet with reason: host reimage
  • 22:16 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 60 hosts with reason: downtime new nokia devices in case they alert during tests
  • 22:16 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 22:16 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 22:11 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy3002.esams.wmnet with OS trixie
  • 22:11 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy3002.esams.wmnet - dzahn@cumin2002"
  • 22:07 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy3002.esams.wmnet - dzahn@cumin2002"
  • 22:07 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy3002.esams.wmnet on all recursors
  • 22:07 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy3002.esams.wmnet on all recursors
  • 22:07 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:07 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy3002.esams.wmnet - dzahn@cumin2002"
  • 22:06 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy3002.esams.wmnet - dzahn@cumin2002"
  • 22:02 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 22:02 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy3002.esams.wmnet
  • 21:58 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy3001.esams.wmnet with OS trixie
  • 21:58 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 21:57 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 21:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:56 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy3001.esams.wmnet - dzahn@cumin2002"
  • 21:56 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy3001.esams.wmnet - dzahn@cumin2002"
  • 21:55 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy3001.esams.wmnet on all recursors
  • 21:55 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy3001.esams.wmnet on all recursors
  • 21:55 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:55 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy3001.esams.wmnet - dzahn@cumin2002"
  • 21:55 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy3001.esams.wmnet - dzahn@cumin2002"
  • 21:51 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:50 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy6002.drmrs.wmnet
  • 21:50 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy6002.drmrs.wmnet with OS trixie
  • 21:50 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:49 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 21:49 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy3001.esams.wmnet
  • 21:47 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host franio1004
  • 21:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host franio1004
  • 21:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt franio1004 - vriley@cumin1003"
  • 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt franio1004 - vriley@cumin1003"
  • 21:43 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy1002.eqiad.wmnet
  • 21:43 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy1002.eqiad.wmnet with OS trixie
  • 21:43 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 21:39 bd808@deploy2002: Installation of scap version "4.217.0" completed for 2 hosts
  • 21:37 bd808@deploy2002: Installing scap version "4.217.0" for 2 host(s)
  • 21:36 maryum: Deployed security fix for T385403
  • 21:32 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy6002.drmrs.wmnet with reason: host reimage
  • 21:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy1002.eqiad.wmnet with reason: host reimage
  • 21:23 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy6002.drmrs.wmnet with reason: host reimage
  • 21:22 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy1002.eqiad.wmnet with reason: host reimage
  • 21:08 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy1002.eqiad.wmnet with OS trixie
  • 21:07 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy1002.eqiad.wmnet - dzahn@cumin2002"
  • 21:07 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy1002.eqiad.wmnet - dzahn@cumin2002"
  • 21:06 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy1002.eqiad.wmnet on all recursors
  • 21:06 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy1002.eqiad.wmnet on all recursors
  • 21:06 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:06 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy1002.eqiad.wmnet - dzahn@cumin2002"
  • 21:06 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy1002.eqiad.wmnet - dzahn@cumin2002"
  • 21:01 aaron@deploy2002: Finished scap sync-world: Backport for Move rest_v1-wikimedia.json under the wwwportal directory (T396805) (duration: 12m 22s)
  • 21:00 cmooney@cumin1003: START - Cookbook sre.hosts.dhcp for host sretest1006.eqiad.wmnet
  • 20:59 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1006.eqiad.wmnet
  • 20:59 cmooney@cumin1003: START - Cookbook sre.hosts.dhcp for host sretest1006.eqiad.wmnet
  • 20:57 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:57 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy1002.eqiad.wmnet
  • 20:56 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy6002.drmrs.wmnet with OS trixie
  • 20:55 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy6002.drmrs.wmnet - dzahn@cumin2002"
  • 20:55 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy6002.drmrs.wmnet - dzahn@cumin2002"
  • 20:55 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy6002.drmrs.wmnet on all recursors
  • 20:55 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy6002.drmrs.wmnet on all recursors
  • 20:55 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:55 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy6002.drmrs.wmnet - dzahn@cumin2002"
  • 20:55 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy6002.drmrs.wmnet - dzahn@cumin2002"
  • 20:54 aaron@deploy2002: aaron: Continuing with sync
  • 20:54 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy4002.ulsfo.wmnet
  • 20:54 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy4002.ulsfo.wmnet with OS trixie
  • 20:53 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1006.eqiad.wmnet with OS trixie
  • 20:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1003"
  • 20:52 aaron@deploy2002: aaron: Backport for Move rest_v1-wikimedia.json under the wwwportal directory (T396805) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1003"
  • 20:49 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:49 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy6002.drmrs.wmnet
  • 20:48 aaron@deploy2002: Started scap sync-world: Backport for Move rest_v1-wikimedia.json under the wwwportal directory (T396805)
  • 20:46 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy6001.drmrs.wmnet
  • 20:46 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy6001.drmrs.wmnet with OS trixie
  • 20:44 kemayo@deploy2002: Finished scap sync-world: Backport for Edit check: instrument when pastes happen with known sources (T407302) (duration: 10m 16s)
  • 20:38 kemayo@deploy2002: kemayo: Continuing with sync
  • 20:36 kemayo@deploy2002: kemayo: Backport for Edit check: instrument when pastes happen with known sources (T407302) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:36 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy4002.ulsfo.wmnet with reason: host reimage
  • 20:34 kemayo@deploy2002: Started scap sync-world: Backport for Edit check: instrument when pastes happen with known sources (T407302)
  • 20:32 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1006.eqiad.wmnet with reason: host reimage
  • 20:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy6001.drmrs.wmnet with reason: host reimage
  • 20:29 cmooney@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1006.eqiad.wmnet with reason: host reimage
  • 20:28 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy4002.ulsfo.wmnet with reason: host reimage
  • 20:23 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy6001.drmrs.wmnet with reason: host reimage
  • 20:21 cjming@deploy2002: Finished scap sync-world: Backport for Add config for xLab MW Module experiment (T401705), hCaptcha: Define HCaptchaSiteKey in CommonSettings.php (T405586) (duration: 15m 31s)
  • 20:11 cjming@deploy2002: kharlan, cjming: Continuing with sync
  • 20:10 cjming@deploy2002: kharlan, cjming: Backport for Add config for xLab MW Module experiment (T401705), hCaptcha: Define HCaptchaSiteKey in CommonSettings.php (T405586) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:06 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy4002.ulsfo.wmnet with OS trixie
  • 20:06 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:06 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1006.eqiad.wmnet with OS trixie
  • 20:06 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy4002.ulsfo.wmnet - dzahn@cumin2002"
  • 20:06 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy4002.ulsfo.wmnet - dzahn@cumin2002"
  • 20:05 cjming@deploy2002: Started scap sync-world: Backport for Add config for xLab MW Module experiment (T401705), hCaptcha: Define HCaptchaSiteKey in CommonSettings.php (T405586)
  • 20:05 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy4002.ulsfo.wmnet on all recursors
  • 20:05 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy4002.ulsfo.wmnet on all recursors
  • 20:05 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:05 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy4002.ulsfo.wmnet - dzahn@cumin2002"
  • 20:05 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy4002.ulsfo.wmnet - dzahn@cumin2002"
  • 20:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 20:03 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2002-dev.codfw.wmnet with OS trixie
  • 19:58 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:58 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy4002.ulsfo.wmnet
  • 19:52 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy5002.eqsin.wmnet
  • 19:52 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy5002.eqsin.wmnet with OS trixie
  • 19:47 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2001-dev.codfw.wmnet with OS trixie
  • 19:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
  • 19:43 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy6001.drmrs.wmnet with OS trixie
  • 19:41 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy6001.drmrs.wmnet - dzahn@cumin2002"
  • 19:41 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy6001.drmrs.wmnet - dzahn@cumin2002"
  • 19:41 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy6001.drmrs.wmnet on all recursors
  • 19:41 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy6001.drmrs.wmnet on all recursors
  • 19:41 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:41 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy6001.drmrs.wmnet - dzahn@cumin2002"
  • 19:39 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy6001.drmrs.wmnet - dzahn@cumin2002"
  • 19:37 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
  • 19:34 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:34 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy6001.drmrs.wmnet
  • 19:31 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy5002.eqsin.wmnet with reason: host reimage
  • 19:27 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
  • 19:25 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy5002.eqsin.wmnet with reason: host reimage
  • 19:24 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
  • 19:21 bvibber@deploy2002: Finished scap sync-world: Backport for Squashed diff to master (duration: 41m 12s)
  • 19:19 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudrabbit2002-dev.codfw.wmnet with OS trixie
  • 19:14 sukhe: sudo cumin "A:cp" "disable-puppet 'merging CR 1198424'"
  • 19:07 bvibber@deploy2002: bvibber, mlitn: Continuing with sync
  • 19:06 bvibber@deploy2002: bvibber, mlitn: Backport for Squashed diff to master synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:05 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudrabbit2001-dev.codfw.wmnet with OS trixie
  • 18:48 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:43 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:40 bvibber@deploy2002: Started scap sync-world: Backport for Squashed diff to master
  • 18:34 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy5002.eqsin.wmnet with OS trixie
  • 18:32 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy5002.eqsin.wmnet - dzahn@cumin2002"
  • 18:31 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy5002.eqsin.wmnet - dzahn@cumin2002"
  • 18:31 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy5002.eqsin.wmnet on all recursors
  • 18:31 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy5002.eqsin.wmnet on all recursors
  • 18:31 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:31 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy5002.eqsin.wmnet - dzahn@cumin2002"
  • 18:30 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy5002.eqsin.wmnet - dzahn@cumin2002"
  • 18:28 musikanimal@deploy2002: Finished scap sync-world: Backport for CodeMirrorWikiEditor: fix selector usurping WikiEditor's search btn (T404543) (duration: 09m 29s)
  • 18:27 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:27 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy5002.eqsin.wmnet
  • 18:24 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 18:23 musikanimal@deploy2002: musikanimal: Backport for CodeMirrorWikiEditor: fix selector usurping WikiEditor's search btn (T404543) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:19 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy5001.eqsin.wmnet
  • 18:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy5001.eqsin.wmnet with OS trixie
  • 18:19 musikanimal@deploy2002: Started scap sync-world: Backport for CodeMirrorWikiEditor: fix selector usurping WikiEditor's search btn (T404543)
  • 18:09 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 17:58 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy5001.eqsin.wmnet with reason: host reimage
  • 17:55 ammarpad@deploy2002: mwscript-k8s job started: refreshImageMetadata.php --start=2018-05-11_Joensuu_station_4.jpg --end=2018-05-11_Joensuu_station_4.jpg --wiki=commonswiki --force # T223051
  • 17:55 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy5001.eqsin.wmnet with reason: host reimage
  • 17:55 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2003-dev.codfw.wmnet with OS trixie
  • 17:41 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:40 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:40 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:40 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:35 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
  • 17:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:35 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:34 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:34 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:31 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
  • 17:30 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
  • 17:28 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:28 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 17:28 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:28 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:27 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
  • 17:26 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:21 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 17:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:15 swfrench@deploy2002: Finished scap sync-world: Backport for Enroll 5% of client sessions in PHP 8.3 (T405955) (duration: 10m 44s)
  • 17:13 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudrabbit2003-dev.codfw.wmnet with OS trixie
  • 17:09 swfrench@deploy2002: swfrench: Continuing with sync
  • 17:06 swfrench@deploy2002: swfrench: Backport for Enroll 5% of client sessions in PHP 8.3 (T405955) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:04 swfrench@deploy2002: Started scap sync-world: Backport for Enroll 5% of client sessions in PHP 8.3 (T405955)
  • 17:03 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy5001.eqsin.wmnet with OS trixie
  • 17:00 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy5001.eqsin.wmnet - dzahn@cumin2002"
  • 17:00 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy5001.eqsin.wmnet - dzahn@cumin2002"
  • 16:59 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy5001.eqsin.wmnet on all recursors
  • 16:59 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy5001.eqsin.wmnet on all recursors
  • 16:59 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:59 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy5001.eqsin.wmnet - dzahn@cumin2002"
  • 16:59 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy5001.eqsin.wmnet - dzahn@cumin2002"
  • 16:55 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 16:55 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy5001.eqsin.wmnet
  • 16:48 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 16:38 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1019.eqiad.wmnet with reason: reboot
  • 16:08 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
  • 16:05 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
  • 15:59 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 15:56 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010
  • 15:55 elukey@cumin2002: START - Cookbook sre.hosts.powercycle for host sretest2010
  • 15:46 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010
  • 15:44 elukey@cumin2002: START - Cookbook sre.hosts.powercycle for host sretest2010
  • 15:40 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1018.eqiad.wmnet with reason: reboot
  • 15:38 swfrench-wmf: rolling run-puppet-agent on A:cp hosts for haproxy config change
  • 15:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010
  • 15:34 elukey@cumin2002: START - Cookbook sre.hosts.powercycle for host sretest2010
  • 15:31 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 15:31 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid1002.eqiad.wmnet
  • 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: failoid1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 15:27 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.powercycle-single (exit_code=99) for host sretest2010
  • 15:27 elukey@cumin2002: START - Cookbook sre.hosts.powercycle-single for host sretest2010
  • 15:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: failoid1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 15:23 swfrench-wmf: disable-puppet on A:cp hosts for haproxy config change
  • 15:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:14 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid1002.eqiad.wmnet
  • 15:01 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 15:00 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 15:00 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:59 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:59 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:58 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:58 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:58 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:58 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:58 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:57 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:57 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:56 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:56 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:53 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 14:46 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet
  • 14:45 dancy@deploy2002: Installation of scap version "4.216.0" completed for 165 hosts
  • 14:44 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet
  • 14:41 dancy@deploy2002: Installing scap version "4.216.0" for 165 host(s)
  • 14:38 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 14:34 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1016.eqiad.wmnet with reason: reboot
  • 14:29 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:29 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:23 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:20 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:20 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 14:19 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:05 zabe@deploy2002: mwscript-k8s job started: extensions/WikimediaMaintenance/T389026.php --wiki=itwikivoyage # T389026
  • 14:03 zabe@deploy2002: mwscript-k8s job started: extensions/WikimediaMaintenance/T389026.php --wiki=dewikivoyage # T389026
  • 14:02 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:02 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid2002.codfw.wmnet
  • 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: failoid2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: failoid2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:43 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:41 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid2002.codfw.wmnet
  • 13:29 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Remove es2026 from dbctl T408385', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20251027-130212-fceratto.json
  • 12:56 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 12:56 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 12:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 12:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 12:40 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:40 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:40 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:40 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:39 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:38 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:32 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:31 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:31 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:30 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:29 fceratto@cumin1003: dbctl commit (dc=all): 'Depool es2026 T408385', diff saved to https://phabricator.wikimedia.org/P84307 and previous config saved to /var/cache/conftool/dbconfig/20251027-122946-fceratto.json
  • 12:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 12:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 12:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test1004.wikimedia.org
  • 12:17 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:17 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1003"
  • 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 12:17 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1003"
  • 12:16 moritzm: installing Java 21 security updates
  • 12:13 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 12:08 slyngshede@cumin1003: START - Cookbook sre.hosts.decommission for hosts idp-test1004.wikimedia.org
  • 11:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 11:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 11:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 11:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 11:37 zabe@deploy2002: Finished scap sync-world: Backport for Using Hadoop for MostTranscludedPages on testwiki (T309738) (duration: 11m 02s)
  • 11:31 zabe@deploy2002: zabe: Continuing with sync
  • 11:30 zabe@deploy2002: zabe: Backport for Using Hadoop for MostTranscludedPages on testwiki (T309738) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:26 zabe@deploy2002: Started scap sync-world: Backport for Using Hadoop for MostTranscludedPages on testwiki (T309738)
  • 11:17 zabe@deploy2002: Finished scap sync-world: Backport for Correctly check if value is not false (duration: 12m 08s)
  • 11:09 zabe@deploy2002: zabe: Continuing with sync
  • 11:08 zabe@deploy2002: zabe: Backport for Correctly check if value is not false synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:04 zabe@deploy2002: Started scap sync-world: Backport for Correctly check if value is not false
  • 11:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp2004.wikimedia.org
  • 10:59 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:59 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp2004.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1003"
  • 10:59 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp2004.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1003"
  • 10:53 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 10:48 slyngshede@cumin1003: START - Cookbook sre.hosts.decommission for hosts idp2004.wikimedia.org
  • 10:34 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp1004.wikimedia.org
  • 10:34 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1003"
  • 10:33 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1003"
  • 10:29 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 10:24 slyngshede@cumin1003: START - Cookbook sre.hosts.decommission for hosts idp1004.wikimedia.org
  • 09:57 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 09:57 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 09:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 09:56 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 09:56 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:56 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:56 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:56 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:40 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts idp1004.wikimedia.org
  • 09:39 slyngshede@cumin1003: START - Cookbook sre.hosts.decommission for hosts idp1004.wikimedia.org
  • 09:37 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:34 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ssw1-d8-eqiad
  • 09:34 cmooney@cumin1003: START - Cookbook sre.hosts.remove-downtime for ssw1-d8-eqiad
  • 09:25 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki 'Lingua Libre/SignIt' SignIt Ammarpad --reason 'requested at phab:T408314' # T408314
  • 08:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 08:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 08:37 dcausse: closing UTC morning backport window
  • 08:36 dcausse@deploy2002: Finished scap sync-world: Backport for Revert^2 "cirrus: enable completion search with defaultsort A/B test" (duration: 30m 01s)
  • 08:26 dcausse@deploy2002: dcausse: Continuing with sync
  • 08:10 dcausse@deploy2002: dcausse: Backport for Revert^2 "cirrus: enable completion search with defaultsort A/B test" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:06 dcausse@deploy2002: Started scap sync-world: Backport for Revert^2 "cirrus: enable completion search with defaultsort A/B test"
  • 08:01 dcausse@deploy2002: Finished scap sync-world: Backport for ext.xLab: Implement UnenrolledExperiment#setStream(), ext.xLab: Implement OverriddenExperiment#setStream(), CompletionSuggester: fix index id format check (T404858) (duration: 46m 13s)
  • 07:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 07:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 07:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 07:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 07:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 07:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 07:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 07:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 07:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 07:46 dcausse@deploy2002: cjming, dcausse: Continuing with sync
  • 07:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 07:40 dcausse@deploy2002: cjming, dcausse: Backport for ext.xLab: Implement UnenrolledExperiment#setStream(), ext.xLab: Implement OverriddenExperiment#setStream(), CompletionSuggester: fix index id format check (T404858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:15 dcausse@deploy2002: Started scap sync-world: Backport for ext.xLab: Implement UnenrolledExperiment#setStream(), ext.xLab: Implement OverriddenExperiment#setStream(), CompletionSuggester: fix index id format check (T404858)
  • 01:18 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 18m 03s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-26

  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 23s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-25

  • 18:18 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 18:18 reedy@deploy2002: Finished scap sync-world: Backport for RecoveryCodeStatusForm: Don't assume there's only one recovery code (T408294) (duration: 17m 08s)
  • 18:12 reedy@deploy2002: reedy: Continuing with sync
  • 18:05 reedy@deploy2002: reedy: Backport for RecoveryCodeStatusForm: Don't assume there's only one recovery code (T408294) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:01 reedy@deploy2002: Started scap sync-world: Backport for RecoveryCodeStatusForm: Don't assume there's only one recovery code (T408294)
  • 17:57 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 17:42 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 17:41 andrew@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin2002"
  • 17:40 andrew@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin2002"
  • 17:22 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1008-dev.eqiad.wmnet with reason: host reimage
  • 17:15 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1008-dev.eqiad.wmnet with reason: host reimage
  • 16:59 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 16:59 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 16:45 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 16:45 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 16:16 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 01:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 09s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-24

  • 21:54 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2006-dev.codfw.wmnet with OS trixie
  • 21:53 ryankemper: [WDQS] See https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&refresh=1m&var-cluster_name=wdqs-main&from=2025-10-24T20:06:49.223Z&to=2025-10-24T21:51:54.665Z&timezone=utc&var-graph_type=%289102%7C919%5B35%5D%29&viewPanel=panel-7 for initial 2 hours of instability
  • 21:52 ryankemper: [WDQS] We're experiencing intermittent difficulty keeping up with the volume of updates. We've started seeing a very large spike in new triples around `2025-10-24 20:09:00`
  • 21:51 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy4001.ulsfo.wmnet
  • 21:51 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy4001.ulsfo.wmnet with OS trixie
  • 21:34 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy4001.ulsfo.wmnet with reason: host reimage
  • 21:28 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy4001.ulsfo.wmnet with reason: host reimage
  • 21:26 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
  • 21:26 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
  • 21:22 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)
  • 21:22 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
  • 21:10 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS trixie
  • 21:05 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy4001.ulsfo.wmnet with OS trixie
  • 21:01 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy4001.ulsfo.wmnet - dzahn@cumin2002"
  • 21:01 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy4001.ulsfo.wmnet - dzahn@cumin2002"
  • 21:01 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy4001.ulsfo.wmnet on all recursors
  • 21:01 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy4001.ulsfo.wmnet on all recursors
  • 21:01 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:01 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy4001.ulsfo.wmnet - dzahn@cumin2002"
  • 21:01 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy4001.ulsfo.wmnet - dzahn@cumin2002"
  • 20:57 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:57 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy4001.ulsfo.wmnet
  • 20:45 dzahn@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy2001.codfw.wmnet
  • 20:45 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 20:45 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 20:45 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:45 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 20:45 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 20:42 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:42 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 20:42 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 20:42 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:41 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 20:41 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 20:36 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:36 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy2001.codfw.wmnet
  • 20:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 20:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 20:24 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 20:24 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 20:22 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 20:22 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 20:21 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2005-dev.codfw.wmnet with OS trixie
  • 20:15 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:13 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:03 dzahn@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy2001.codfw.wmnet
  • 20:03 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 20:03 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 20:03 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:03 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 20:03 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 19:59 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:59 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 19:59 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 19:59 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:59 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 19:59 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 19:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:56 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:55 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy2001.codfw.wmnet
  • 19:55 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:45 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:45 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:44 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:44 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 19:43 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:43 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:39 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 19:38 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:36 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:28 dzahn@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy2001.codfw.wmnet
  • 19:28 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 19:28 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 19:28 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:28 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 19:28 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 19:24 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:24 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 19:24 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 19:24 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:24 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 19:24 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 19:23 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS trixie
  • 19:13 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:13 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy2001.codfw.wmnet
  • 19:08 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:05 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy2001.codfw.wmnet
  • 19:00 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 19:00 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 19:00 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:00 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 19:00 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 18:56 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:56 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 18:56 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 18:56 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:56 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 18:56 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 18:52 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:52 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy2001.codfw.wmnet
  • 18:50 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:47 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:46 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:46 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge pending changes - sukhe@cumin1003"
  • 18:46 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge pending changes - sukhe@cumin1003"
  • 18:43 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:43 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:43 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 18:18 cdobbins@puppetserver1001: conftool action : get/pooled=no; selector: service=cdn
  • 18:16 cdobbins@puppetserver1001: conftool action : get/weight=1; selector: service=cdn
  • 17:47 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:42 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 16:44 ejegg: fundraising civicrm upgraded from 1bade506 to 3819e60c
  • 16:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2078.codfw.wmnet with OS bookworm
  • 16:25 mutante: codesearch9.codesearch systemctl restart hound-operations T408218
  • 16:18 mutante: codesearch9.codesearch truncate -s 0 /var/log/account/pacct -> disk space from 100% used to 37% used T408221 T408218
  • 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 16:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 16:05 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 15:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 15:46 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS bookworm
  • 15:46 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:46 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2078.codfw.wmnet with OS trixie
  • 15:46 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:42 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 15:33 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 15:33 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 15:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 15:16 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1006.eqiad.wmnet with OS trixie
  • 15:13 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 15:04 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 14:59 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 14:55 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS trixie
  • 14:53 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2078.codfw.wmnet with OS trixie
  • 14:37 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 14:33 filippo@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 14:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 14:24 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 14:05 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS trixie
  • 13:59 filippo@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 13:56 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1006.eqiad.wmnet with OS trixie
  • 13:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddumps1002.wikimedia.org
  • 13:37 filippo@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 13:37 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddumps1001.wikimedia.org
  • 13:35 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host clouddumps1002.wikimedia.org
  • 13:28 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host clouddumps1001.wikimedia.org
  • 13:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2078.codfw.wmnet with OS bookworm
  • 13:23 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host clouddumps1001.wikimedia.org
  • 13:23 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host clouddumps1001.wikimedia.org
  • 13:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 12:59 godog: temp disable "automatically reboot after install" d-i options on apt1002
  • 12:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 12:54 filippo@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 12:41 filippo@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2010-dev']
  • 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS bookworm
  • 12:34 sukhe: sudo manage_principals.py reset-password fabfur --email_address=ffurnari@wikimedia.org: T408193
  • 12:34 filippo@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2010-dev']
  • 12:34 filippo@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2010-dev']
  • 12:24 filippo@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2010-dev']
  • 12:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2078.codfw.wmnet with OS bookworm
  • 12:09 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1006.eqiad.wmnet with OS trixie
  • 11:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 11:45 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 11:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 11:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 11:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 11:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 11:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 11:26 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
  • 11:26 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade
  • 11:26 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS bookworm
  • 11:26 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
  • 11:25 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade
  • 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 11:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2078.codfw.wmnet with OS bookworm
  • 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 10:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 10:49 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1006.eqiad.wmnet with OS trixie
  • 10:47 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 10:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS bookworm
  • 10:11 cmooney@dns2005: END - running authdns-update
  • 10:10 cmooney@dns2005: START - running authdns-update
  • 08:43 gehel: cleanup old jar files on an-worker nodes - T396582 - sudo cumin A:hadoop-worker 'find /tmp -name *.jar -mtime +30 -delete'
  • 08:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 07:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 07:37 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 07:26 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 07:22 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 07:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 07:06 krinkle@deploy2002: Finished scap sync-world: Backport for wmf-config: Stop sending HTTP purges for mobile domains (T405931) (duration: 13m 35s)
  • 07:00 krinkle@deploy2002: krinkle: Continuing with sync
  • 06:57 krinkle@deploy2002: krinkle: Backport for wmf-config: Stop sending HTTP purges for mobile domains (T405931) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:53 krinkle@deploy2002: Started scap sync-world: Backport for wmf-config: Stop sending HTTP purges for mobile domains (T405931)
  • 02:59 ejegg: payments-wiki upgraded from 3753f979 to 5f72d7b3
  • 01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 45s)
  • 01:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-23

  • 23:57 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1022.eqiad.wmnet, repooling both afterwards
  • 23:46 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1022.eqiad.wmnet, repooling both afterwards
  • 23:46 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1021.eqiad.wmnet, repooling both afterwards
  • 23:35 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1021.eqiad.wmnet, repooling both afterwards
  • 23:35 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1020.eqiad.wmnet, repooling both afterwards
  • 23:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1020.eqiad.wmnet, repooling both afterwards
  • 23:23 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1019.eqiad.wmnet, repooling both afterwards
  • 23:12 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1019.eqiad.wmnet, repooling both afterwards
  • 23:12 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1018.eqiad.wmnet, repooling both afterwards
  • 23:01 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1018.eqiad.wmnet, repooling both afterwards
  • 23:01 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling both afterwards
  • 22:50 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling both afterwards
  • 22:50 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling both afterwards
  • 22:49 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS trixie
  • 22:38 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling both afterwards
  • 22:38 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1015.eqiad.wmnet, repooling both afterwards
  • 22:30 swfrench@deploy2002: Finished scap sync-world: Backport for Reenable enrollment in PHP 8.3 at 1% (T405955) (duration: 12m 10s)
  • 22:27 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1015.eqiad.wmnet, repooling both afterwards
  • 22:27 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1014.eqiad.wmnet, repooling both afterwards
  • 22:23 swfrench@deploy2002: swfrench: Continuing with sync
  • 22:22 swfrench@deploy2002: swfrench: Backport for Reenable enrollment in PHP 8.3 at 1% (T405955) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:17 swfrench@deploy2002: Started scap sync-world: Backport for Reenable enrollment in PHP 8.3 at 1% (T405955)
  • 22:15 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1014.eqiad.wmnet, repooling both afterwards
  • 22:15 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1013.eqiad.wmnet, repooling both afterwards
  • 22:15 swfrench@deploy2002: Finished scap sync-world: Return next/migration releases to 8.3 - T405955 (duration: 09m 52s)
  • 22:14 cwhite: restart apache2 on gerrit1003
  • 22:05 swfrench@deploy2002: Started scap sync-world: Return next/migration releases to 8.3 - T405955
  • 22:05 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 22:04 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1013.eqiad.wmnet, repooling both afterwards
  • 22:04 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1012.eqiad.wmnet, repooling both afterwards
  • 22:00 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 21:52 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1012.eqiad.wmnet, repooling both afterwards
  • 21:44 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS trixie
  • 21:43 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1026.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:42 krinkle@deploy2002: Finished scap sync-world: Backport for MentorDashboard,UserImpact: Bump cache and set proper keygroup (T407403), MentorDashboard,UserImpact: Bump cache and set proper keygroup (T407403) (duration: 10m 34s)
  • 21:37 krinkle@deploy2002: krinkle: Continuing with sync
  • 21:33 krinkle@deploy2002: krinkle: Backport for MentorDashboard,UserImpact: Bump cache and set proper keygroup (T407403), MentorDashboard,UserImpact: Bump cache and set proper keygroup (T407403) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:31 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1026.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:31 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1025.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:31 krinkle@deploy2002: Started scap sync-world: Backport for MentorDashboard,UserImpact: Bump cache and set proper keygroup (T407403), MentorDashboard,UserImpact: Bump cache and set proper keygroup (T407403)
  • 21:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1025.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:19 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 21:12 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 21:03 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 21:00 krinkle@deploy2002: Finished scap sync-world: Backport for fix(MentorDashboard): fix caching for PHP 8.1 -> 8.3 migration (T407403) (duration: 15m 06s)
  • 20:56 krinkle@deploy2002: krinkle: Continuing with sync
  • 20:49 krinkle@deploy2002: krinkle: Backport for fix(MentorDashboard): fix caching for PHP 8.1 -> 8.3 migration (T407403) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:45 krinkle@deploy2002: Started scap sync-world: Backport for fix(MentorDashboard): fix caching for PHP 8.1 -> 8.3 migration (T407403)
  • 20:38 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 20:34 jsn@deploy2002: Finished scap sync-world: Backport for Revert "Set AutoModeratorMultiLingualRevertRisk with available wikis" (duration: 08m 53s)
  • 20:33 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 20:30 jsn@deploy2002: jsn: Continuing with sync
  • 20:29 jsn@deploy2002: jsn: Backport for Revert "Set AutoModeratorMultiLingualRevertRisk with available wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:25 jsn@deploy2002: Started scap sync-world: Backport for Revert "Set AutoModeratorMultiLingualRevertRisk with available wikis"
  • 20:23 dzahn@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy2001.codfw.wmnet
  • 20:23 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 20:23 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 20:23 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:23 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 20:23 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 20:22 jsn@deploy2002: Sync cancelled.
  • 20:20 jsn@deploy2002: jsn, kgraessle: Backport for Set AutoModeratorMultiLingualRevertRisk with available wikis (T400727) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:17 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:16 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 20:16 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 20:16 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:16 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 20:16 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 20:16 jsn@deploy2002: Started scap sync-world: Backport for Set AutoModeratorMultiLingualRevertRisk with available wikis (T400727)
  • 20:14 cjming@deploy2002: Finished scap sync-world: Backport for Add config for xLab MW Module experiment (T401705) (duration: 11m 23s)
  • 20:14 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 20:13 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:13 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy2001.codfw.wmnet
  • 20:12 dzahn@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy2001.codfw.wmnet
  • 20:12 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:10 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 20:10 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 20:10 cjming@deploy2002: cjming: Continuing with sync
  • 20:08 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:08 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy2001.codfw.wmnet
  • 20:08 dzahn@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy2001.codfw.wmnet
  • 20:08 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 20:07 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 20:07 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:07 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 20:07 cjming@deploy2002: cjming: Backport for Add config for xLab MW Module experiment (T401705) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:07 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 20:03 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:03 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 20:03 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 20:03 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:03 cjming@deploy2002: Started scap sync-world: Backport for Add config for xLab MW Module experiment (T401705)
  • 20:02 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy1001.eqiad.wmnet with OS trixie
  • 20:00 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 20:00 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy2001.codfw.wmnet
  • 19:50 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy1001.eqiad.wmnet with reason: host reimage
  • 19:46 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy1001.eqiad.wmnet with reason: host reimage
  • 19:37 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host tcp-proxy1001.eqiad.wmnet with OS trixie
  • 19:33 dzahn@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy2001.codfw.wmnet
  • 19:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 19:32 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 19:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:30 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:30 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 19:30 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 19:29 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 19:26 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy1001.eqiad.wmnet
  • 19:26 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host tcp-proxy1001.eqiad.wmnet with OS trixie
  • 19:25 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:25 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy2001.codfw.wmnet
  • 19:25 dzahn@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host tcp-proxy2001.codfw.wmnet
  • 19:25 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host tcp-proxy2001.codfw.wmnet with OS trixie
  • 19:06 dancy@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.24 refs T405680
  • 18:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:53 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:30 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy2001.codfw.wmnet with OS trixie
  • 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 18:29 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy2001.codfw.wmnet on all recursors
  • 18:29 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy2001.codfw.wmnet on all recursors
  • 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 18:29 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy2001.codfw.wmnet - dzahn@cumin2002"
  • 18:27 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host tcp-proxy1001.eqiad.wmnet with OS trixie
  • 18:26 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy1001.eqiad.wmnet - dzahn@cumin1002"
  • 18:25 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy1001.eqiad.wmnet - dzahn@cumin1002"
  • 18:25 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:25 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy2001.codfw.wmnet
  • 18:25 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy1001.eqiad.wmnet on all recursors
  • 18:25 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache tcp-proxy1001.eqiad.wmnet on all recursors
  • 18:25 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:25 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy1001.eqiad.wmnet - dzahn@cumin1002"
  • 18:25 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy1001.eqiad.wmnet - dzahn@cumin1002"
  • 18:15 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2057 slowly with 10 steps - Pooling in new host
  • 18:10 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:10 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy1001.eqiad.wmnet
  • 16:58 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2002.codfw.wmnet with OS bookworm
  • 16:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 16:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
  • 16:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:51 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverses for 10.64.186.1 - cmooney@cumin1003"
  • 16:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverses for 10.64.186.1 - cmooney@cumin1003"
  • 16:48 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 16:35 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
  • 16:31 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 16:30 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 16:28 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
  • 16:25 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 16:25 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 16:20 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 16:20 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 16:14 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 16:13 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 16:07 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS bookworm
  • 16:06 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host maps-test2002.codfw.wmnet with OS trixie
  • 16:05 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on sretest2001.codfw.wmnet with reason: T383173
  • 15:58 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2057 slowly with 10 steps - Pooling in new host
  • 15:58 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
  • 15:53 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
  • 15:44 jhancock@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2203']
  • 15:43 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2203']
  • 15:43 jhancock@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2203']
  • 15:41 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ms-be2078.codfw.wmnet
  • 15:41 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2057.codfw.wmnet
  • 15:41 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2057.codfw.wmnet
  • 15:41 fceratto@cumin1003: dbctl commit (dc=all): 'Add es2057 T402859', diff saved to https://phabricator.wikimedia.org/P84277 and previous config saved to /var/cache/conftool/dbconfig/20251023-154056-fceratto.json
  • 15:40 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2078.codfw.wmnet
  • 15:38 cmooney@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1006.eqiad.wmnet with OS trixie
  • 15:38 brett@dns1004: END - running authdns-update
  • 15:37 brett@dns1004: START - running authdns-update
  • 15:36 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS trixie
  • 15:32 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2203']
  • 15:32 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host maps-test2002.codfw.wmnet with OS trixie
  • 15:29 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ms-be2078.codfw.wmnet
  • 15:28 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2078.codfw.wmnet
  • 15:28 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1005.eqiad.wmnet with OS trixie
  • 15:28 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 15:16 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS trixie
  • 15:15 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host maps-test2002.codfw.wmnet with OS trixie
  • 15:10 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
  • 15:02 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
  • 14:57 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 14:53 cmooney@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "run sync to add new nokia switches - cmooney@cumin1003 - T405558"
  • 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "run sync to add new nokia switches - cmooney@cumin1003 - T405558"
  • 14:43 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS trixie
  • 14:42 stevemunene@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/echoserver: apply
  • 14:41 stevemunene@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/echoserver: apply
  • 14:40 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 14:40 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1006.eqiad.wmnet with OS trixie
  • 14:38 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
  • 14:37 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:31 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
  • 14:26 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:14 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS trixie
  • 14:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 14:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 13:59 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:58 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 13:58 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 13:57 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 13:57 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 13:57 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:57 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 13:56 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 13:56 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 13:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 13:54 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 13:54 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 13:53 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 13:53 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 13:53 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 13:52 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 13:52 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 13:52 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 13:51 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 13:51 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 13:51 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 13:50 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 13:49 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 13:48 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 13:47 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 13:47 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 13:47 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 13:46 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 13:46 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 13:46 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 13:45 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 13:45 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 13:45 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 13:44 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 13:44 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 13:43 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 13:43 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 13:43 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 13:43 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 13:42 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 13:42 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 13:42 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 13:42 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 13:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 13:41 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 13:41 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 13:41 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 13:40 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 13:40 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
  • 13:40 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
  • 13:40 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
  • 13:39 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
  • 13:39 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS trixie
  • 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 13:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 13:32 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: foreachwikiindblist sul CentralAuth:FixRenameUserLocalLogs --logwiki=metawiki --batch-size=25 # T398177 (dry run)
  • 13:30 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS trixie
  • 13:29 inflatador: bking@cumin2002 `sudo cumin 'A:wdqs-main and A:codfw' 'depool ; systemctl restart wdqs-blazegraph ; sleep 30 ; pool'`
  • 13:28 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS trixie
  • 13:27 Lucas_WMDE: (cont.) Finished scap sync-world: Backport for … cswiktionary:Disable subpages in the main namespace (T406728) (duration: 09m 41s)
  • {{safesubst:SAL entry|1=13:27 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for FixRenameUserLocalLogs: Check for same log_actor between local and global log entry (T398177), FixRenameUserLocalLogs: Check for same log_actor between local and global log entry (T398177), jawiki: Add ipblock-exempt to the accountcreator user group (T407855), [[gerrit:1198283|cswiktionary:}}
  • 13:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
  • 13:23 lucaswerkmeister-wmde@deploy2002: asmartkitten, lucaswerkmeister-wmde, matmarex, dragoniez: Continuing with sync
  • 13:22 lucaswerkmeister-wmde@deploy2002: asmartkitten, lucaswerkmeister-wmde, matmarex, dragoniez: Backport for FixRenameUserLocalLogs: Check for same log_actor between local and global log entry (T398177), FixRenameUserLocalLogs: Check for same log_actor between local and global log entry (T398177), jawiki: Add ipblock-exempt to the accountcreator user group (T407855)
  • 13:18 Lucas_WMDE: (cont) Started scap sync-world: Backport for … cswiktionary:Disable subpages in the main namespace (T406728)
  • {{safesubst:SAL entry|1=13:17 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for FixRenameUserLocalLogs: Check for same log_actor between local and global log entry (T398177), FixRenameUserLocalLogs: Check for same log_actor between local and global log entry (T398177), jawiki: Add ipblock-exempt to the accountcreator user group (T407855), [[gerrit:1198283|cswiktionary:}}
  • 13:17 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
  • 13:16 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2014.codfw.wmnet
  • 13:15 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Add virtual domain mapping for OAuth (T348485) (duration: 10m 48s)
  • 13:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, d3r1ck01: Continuing with sync
  • 13:10 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host lvs2014.codfw.wmnet
  • 13:08 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, d3r1ck01: Backport for Add virtual domain mapping for OAuth (T348485) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:04 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Add virtual domain mapping for OAuth (T348485)
  • 13:00 kart_: Update Recommendation API to 2025-10-22-134201-production (T407895, T407894)
  • 12:58 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS trixie
  • 12:58 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:53 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:50 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:48 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:46 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:32 kart_: cxserver: Remove Yandex MT service (T407345)
  • 12:32 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:31 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:31 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:30 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:22 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:22 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:51 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:51 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:49 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 11:48 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 11:47 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2078.codfw.wmnet with OS trixie
  • 11:25 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:24 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 11:08 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 11:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:03 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:01 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:01 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:01 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:01 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:50 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS trixie
  • 10:48 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 10:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 10:31 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:31 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1089.eqiad.wmnet with OS bullseye
  • 10:31 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:28 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:25 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2034 gradually with 4 steps - Pooling in
  • 10:20 akosiaris@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 10:19 akosiaris@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 10:17 akosiaris@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:17 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:16 akosiaris@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 10:15 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1089.eqiad.wmnet with reason: host reimage
  • 10:14 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 10:13 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 10:12 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:11 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1089.eqiad.wmnet with reason: host reimage
  • 10:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 10:10 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 10:09 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 10:03 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amire80 out of all services on: 2412 hosts
  • 09:58 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1089.eqiad.wmnet with OS bullseye
  • 09:46 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 09:46 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 09:45 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 09:43 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:43 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 09:41 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 09:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 09:40 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 09:39 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2034 gradually with 4 steps - Pooling in
  • 09:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 09:39 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 09:37 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2034.codfw.wmnet
  • 09:37 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2034.codfw.wmnet
  • 09:33 akosiaris@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 09:24 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 09:23 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 09:23 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 09:22 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 09:22 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 09:21 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 09:20 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 09:20 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 09:18 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 09:18 tstarling@deploy2002: Finished scap sync-world: Backport for recentchanges: Fix incorrect alias in isDenseTagFilter (T408040) (duration: 10m 51s)
  • 09:18 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 09:18 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 09:18 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 09:14 tstarling@deploy2002: tstarling: Continuing with sync
  • 09:12 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 09:12 tstarling@deploy2002: tstarling: Backport for recentchanges: Fix incorrect alias in isDenseTagFilter (T408040) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 09:11 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 09:11 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 09:10 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 09:09 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 09:07 tstarling@deploy2002: Started scap sync-world: Backport for recentchanges: Fix incorrect alias in isDenseTagFilter (T408040)
  • 09:01 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 09:01 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 09:01 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 09:01 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 09:00 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 09:00 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 09:00 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 09:00 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 09:00 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 09:00 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 09:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 09:00 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 09:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 09:00 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 08:59 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:59 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 08:58 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 08:57 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 08:56 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 08:51 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 08:50 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 08:50 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 08:50 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 08:50 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:50 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:49 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 08:49 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 08:49 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 08:49 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 08:31 kharlan@deploy2002: Finished scap sync-world: Backport for Instrument the Suggested investigations feature (T404177) (duration: 12m 35s)
  • 08:27 kharlan@deploy2002: kharlan: Continuing with sync
  • 08:27 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:26 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 08:25 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 08:25 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 08:24 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:24 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 08:23 kharlan@deploy2002: kharlan: Backport for Instrument the Suggested investigations feature (T404177) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:19 kharlan@deploy2002: Started scap sync-world: Backport for Instrument the Suggested investigations feature (T404177)
  • 08:14 ryankemper: [WDQS] `ryankemper@cumin2002:~$ sudo -E cumin 'wdqs1014*' 'systemctl restart wdqs-blazegraph'` (restart service to fix 12 hour deadlock)
  • 07:58 dcausse: closing UTC morning backport window
  • 07:56 dcausse@deploy2002: Finished scap sync-world: Backport for Revert "cirrus: enable completion search with defaultsort A/B test" (duration: 09m 38s)
  • 07:52 dcausse@deploy2002: dcausse: Continuing with sync
  • 07:51 dcausse@deploy2002: dcausse: Backport for Revert "cirrus: enable completion search with defaultsort A/B test" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:47 dcausse@deploy2002: Started scap sync-world: Backport for Revert "cirrus: enable completion search with defaultsort A/B test"
  • 07:44 dcausse@deploy2002: Sync cancelled.
  • 07:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2020.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 07:32 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2020.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 07:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2019.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 07:21 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2019.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 07:20 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2018.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 07:18 dcausse@deploy2002: dcausse: Backport for cirrus: enable completion search with defaultsort A/B test (T404858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:13 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: enable completion search with defaultsort A/B test (T404858)
  • 07:09 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2018.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 07:06 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2022.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 07:06 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs2007.codfw.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 06:55 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2022.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 06:55 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2007.codfw.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 06:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 06:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs2007.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 06:43 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2007.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 06:43 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 06:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2013.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 06:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs2007.codfw.wmnet -> wdqs2012.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 06:40 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 06:39 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 06:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 06:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 06:32 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2013.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 06:31 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2007.codfw.wmnet -> wdqs2012.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 06:03 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 05:59 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs2007.codfw.wmnet -> wdqs2010.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 05:51 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 05:49 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 05:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2008.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 05:48 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2007.codfw.wmnet -> wdqs2010.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 05:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs2007.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 04:58 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2007.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 04:56 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T406920, sync categories journal) xfer categories from wdqs2007.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 04:55 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs2007.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 04:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, sync categories journal) xfer categories from wdqs1011.eqiad.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 04:31 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, sync categories journal) xfer categories from wdqs1011.eqiad.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 02:21 tstarling@deploy2002: Finished scap sync-world: Backport for recentchanges: QueryRateEstimator improvements (T403798), recentchanges: Restore table qualifiers in change tag field expressions (T408040) (duration: 15m 30s)
  • 02:17 tstarling@deploy2002: tstarling: Continuing with sync
  • 02:11 tstarling@deploy2002: tstarling: Backport for recentchanges: QueryRateEstimator improvements (T403798), recentchanges: Restore table qualifiers in change tag field expressions (T408040) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 02:05 tstarling@deploy2002: Started scap sync-world: Backport for recentchanges: QueryRateEstimator improvements (T403798), recentchanges: Restore table qualifiers in change tag field expressions (T408040)
  • 01:35 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.clone_es (exit_code=99) of es2034.codfw.wmnet onto es2057.codfw.wmnet
  • 00:18 cwhite: restart apache2 on gerrit1003

2025-10-22

  • 22:49 Reedy: T407057 - ran mwscript extensions/OATHAuth/maintenance/MoveRecoveryCodesFromTOTP.php --wiki=metawiki
  • 22:35 jhathaway@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
  • 22:34 jhathaway@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 22:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2078']
  • 22:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
  • 22:32 Reedy: T407057 - ran foreachwikiindblist private.dblist extensions/OATHAuth/maintenance/MoveRecoveryCodesFromTOTP.php
  • 22:31 Reedy: T407057 - ran foreachwikiindblist fishbowl.dblist extensions/OATHAuth/maintenance/MoveRecoveryCodesFromTOTP.php
  • 22:30 jdlrobson@deploy2002: Finished scap sync-world: Backport for Enable QuickSurveys on all wikis (T317841) (duration: 10m 12s)
  • 22:29 jhathaway@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003']
  • 22:26 jdlrobson@deploy2002: jdlrobson: Continuing with sync
  • 22:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2078']
  • 22:25 Reedy: T407057 - ran mwscript extensions/OATHAuth/maintenance/MoveRecoveryCodesFromTOTP.php --wiki=officewiki
  • 22:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
  • 22:24 jdlrobson@deploy2002: jdlrobson: Backport for Enable QuickSurveys on all wikis (T317841) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2078']
  • 22:23 jhathaway@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 22:20 jdlrobson@deploy2002: Started scap sync-world: Backport for Enable QuickSurveys on all wikis (T317841)
  • 22:17 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 22:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
  • 22:17 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 22:15 jhathaway@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
  • 22:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2078']
  • 22:15 jhathaway@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 22:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
  • 22:14 jhathaway@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2058']
  • 22:13 jhathaway@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1002']
  • 22:12 jhathaway@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1002']
  • 22:06 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 22:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2078']
  • 22:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
  • 21:09 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 21:09 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 20:57 kgraessle@deploy2002: Finished scap sync-world: Backport for Fix InvalidArgumentException in Watchlist (T407996) (duration: 10m 49s)
  • 20:52 kgraessle@deploy2002: kgraessle: Continuing with sync
  • 20:50 kgraessle@deploy2002: kgraessle: Backport for Fix InvalidArgumentException in Watchlist (T407996) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:46 kgraessle@deploy2002: Started scap sync-world: Backport for Fix InvalidArgumentException in Watchlist (T407996)
  • 20:31 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 20:30 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 20:26 krinkle@deploy2002: Finished scap sync-world: Backport for fix(MentorDashboard): fix caching for PHP 8.1 -> 8.3 migration (T407403) (duration: 12m 38s)
  • 20:23 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:22 krinkle@deploy2002: krinkle: Continuing with sync
  • 20:22 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:20 cmooney@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS trixie
  • 20:18 krinkle@deploy2002: krinkle: Backport for fix(MentorDashboard): fix caching for PHP 8.1 -> 8.3 migration (T407403) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:14 krinkle@deploy2002: Started scap sync-world: Backport for fix(MentorDashboard): fix caching for PHP 8.1 -> 8.3 migration (T407403)
  • 20:04 ebernhardson@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:04 ebernhardson@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:55 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:54 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:51 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:51 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:33 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:33 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:27 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:27 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:11 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2001.codfw.wmnet with reason: sleep test
  • 19:07 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:06 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:06 sukhe: sudo cumin "A:cp" "run-puppet-agent --enable 'merging CR 1198132'"
  • 18:53 sukhe: sudo cumin "A:cp" "disable-puppet 'merging CR 1198132'"
  • 18:40 ejegg: fundraising civicrm upgraded from b82b0ef5 to 1bade506
  • 18:39 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1026.eqiad.wmnet, repooling both afterwards
  • 18:38 ejegg: payments-wiki upgraded from ea963482 to 3753f979
  • 18:34 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1026.eqiad.wmnet, repooling both afterwards
  • 18:34 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1025.eqiad.wmnet, repooling both afterwards
  • 18:29 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1025.eqiad.wmnet, repooling both afterwards
  • 18:28 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 18:18 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS trixie
  • 18:17 cmooney@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS trixie
  • 18:16 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.24 refs T405680
  • 18:11 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on zuul2001.codfw.wmnet with reason: still in setup
  • 18:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on zuul1001.eqiad.wmnet with reason: still in setup
  • 18:09 Amir1: deleting local user_password on sul wikis (T104500)
  • 18:08 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on zuul2001.codfw.wmnet with reason: still in setup
  • 17:53 Amir1: mwscript-k8s --dblist=small --follow -- purgeUserOptions.php --login-age 11 (T406724)
  • 17:38 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_codfw and A:cp
  • 17:37 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2041.codfw.wmnet
  • 17:31 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS trixie
  • 17:30 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_codfw and A:cp
  • 17:30 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2042.codfw.wmnet
  • 17:12 cmooney@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS trixie
  • 17:12 kamila@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:10 kamila@cumin1003: START - Cookbook sre.dns.netbox
  • 16:59 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS trixie
  • 16:59 ejegg: SmashPig upgraded from b1f04532 to ecba7d88
  • 16:58 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2039.codfw.wmnet
  • 16:57 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:51 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2040.codfw.wmnet
  • 16:44 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:44 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:40 kamila@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker2203.codfw.wmnet with reason: host unresponsive
  • 16:39 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:39 kamila@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2203.codfw.wmnet
  • 16:38 kamila@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2203.codfw.wmnet
  • 16:23 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:23 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:16 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2037.codfw.wmnet
  • 16:16 filippo@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 16:11 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:11 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2038.codfw.wmnet
  • 16:11 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:09 ejegg: SmashPig upgraded from 9a7e626c to b1f04532
  • 16:08 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:08 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:06 ammarpad@deploy2002: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bswiki --logwiki=metawiki Horvathbence200603 HorvBence # T407995
  • 16:05 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:05 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:04 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on zuul2002.codfw.wmnet with reason: still in setup
  • 15:58 filippo@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 15:34 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2035.codfw.wmnet
  • 15:30 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2036.codfw.wmnet
  • 14:58 filippo@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 14:55 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2033.codfw.wmnet
  • 14:50 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2034.codfw.wmnet
  • 14:43 ejegg: fundraising civicrm upgraded from 5dfa45d3 to b82b0ef5
  • 14:28 filippo@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 14:25 ejegg: donorwiki upgraded from 039e5a15 to ea963482
  • 14:24 ejegg: payments-wiki upgraded from bb9ad03a to ea963482
  • 14:21 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 14:20 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 14:19 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1022.eqiad.wmnet, repooling both afterwards
  • 14:17 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:16 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:16 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:16 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:16 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:15 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:14 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2031.codfw.wmnet
  • 14:14 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1022.eqiad.wmnet, repooling both afterwards
  • 14:14 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1021.eqiad.wmnet, repooling both afterwards
  • 14:11 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2032.codfw.wmnet
  • 14:11 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:10 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1021.eqiad.wmnet, repooling both afterwards
  • 14:09 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1020.eqiad.wmnet, repooling both afterwards
  • 14:09 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:07 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:07 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:05 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1020.eqiad.wmnet, repooling both afterwards
  • 14:05 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1019.eqiad.wmnet, repooling both afterwards
  • 14:00 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1019.eqiad.wmnet, repooling both afterwards
  • 14:00 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1018.eqiad.wmnet, repooling both afterwards
  • 13:55 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1018.eqiad.wmnet, repooling both afterwards
  • 13:55 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling both afterwards
  • 13:50 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling both afterwards
  • 13:50 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling both afterwards
  • 13:45 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es1030.eqiad.wmnet
  • 13:45 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:45 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1030.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 13:45 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1030.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 13:44 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling both afterwards
  • 13:44 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1015.eqiad.wmnet, repooling both afterwards
  • 13:43 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:41 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 13:41 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Set Alias entity usage modifier limit to 10. (T401288) (duration: 20m 47s)
  • 13:40 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1015.eqiad.wmnet, repooling both afterwards
  • 13:40 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1014.eqiad.wmnet, repooling both afterwards
  • 13:37 lucaswerkmeister-wmde@deploy2002: seanleong-wmde, lucaswerkmeister-wmde: Continuing with sync
  • 13:36 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts es1030.eqiad.wmnet
  • 13:35 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1014.eqiad.wmnet, repooling both afterwards
  • 13:34 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2029.codfw.wmnet
  • 13:34 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1013.eqiad.wmnet, repooling both afterwards
  • 13:32 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2030.codfw.wmnet
  • 13:28 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1013.eqiad.wmnet, repooling both afterwards
  • 13:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ssw1-d1-eqiad with reason: downtime ssw1-d1-eqiad until we have the monitoring checks fully working for the new platform
  • 13:26 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1012.eqiad.wmnet, repooling both afterwards
  • 13:26 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:25 lucaswerkmeister-wmde@deploy2002: seanleong-wmde, lucaswerkmeister-wmde: Backport for Set Alias entity usage modifier limit to 10. (T401288) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:22 jclark@cumin1003: START - Cookbook sre.hosts.provision for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:21 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T406920, Update outdated categories info) xfer categories from wdqs1011.eqiad.wmnet -> wdqs1012.eqiad.wmnet, repooling both afterwards
  • 13:20 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Set Alias entity usage modifier limit to 10. (T401288)
  • 13:18 marostegui@cumin1003: dbctl commit (dc=all): 'db1263 (re)pooling @ 100%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84264 and previous config saved to /var/cache/conftool/dbconfig/20251022-131826-root.json
  • 13:15 mfossati@deploy2002: Finished scap sync-world: Backport for Deploy the ReaderExperiments extension to English Wikipedia (T406907) (duration: 10m 32s)
  • 13:11 mfossati@deploy2002: mfossati: Continuing with sync
  • 13:10 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 13:09 mfossati@deploy2002: mfossati: Backport for Deploy the ReaderExperiments extension to English Wikipedia (T406907) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:06 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:04 mfossati@deploy2002: Started scap sync-world: Backport for Deploy the ReaderExperiments extension to English Wikipedia (T406907)
  • 13:04 jgleeson: SmashPig upgraded from aa45ee08 to 9a7e626c
  • 13:03 jclark@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 13:03 marostegui@cumin1003: dbctl commit (dc=all): 'db1263 (re)pooling @ 75%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84263 and previous config saved to /var/cache/conftool/dbconfig/20251022-130320-root.json
  • 13:02 marostegui@cumin1003: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84262 and previous config saved to /var/cache/conftool/dbconfig/20251022-130226-root.json
  • 13:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 12:54 reedy@deploy2002: Synchronized wmf-config/CommonSettings.php: T407167 (duration: 08m 29s)
  • 12:53 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 12:53 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2027.codfw.wmnet
  • 12:50 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp2028.codfw.wmnet
  • 12:48 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:48 jclark@cumin1002: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:48 marostegui@cumin1003: dbctl commit (dc=all): 'db1263 (re)pooling @ 60%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84260 and previous config saved to /var/cache/conftool/dbconfig/20251022-124814-root.json
  • 12:47 marostegui@cumin1003: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84259 and previous config saved to /var/cache/conftool/dbconfig/20251022-124720-root.json
  • 12:45 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:45 jclark@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:41 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_codfw and A:cp
  • 12:41 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_codfw and A:cp
  • 12:40 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:40 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:39 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Rate-limit by wmfuniq - oblivian@cumin1003"
  • 12:39 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Rate-limit by wmfuniq - oblivian@cumin1003
  • 12:38 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Rate-limit by wmfuniq - oblivian@cumin1003
  • 12:38 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Rate-limit by wmfuniq - oblivian@cumin1003"
  • 12:38 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:38 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:37 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:33 marostegui@cumin1003: dbctl commit (dc=all): 'db1263 (re)pooling @ 50%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84258 and previous config saved to /var/cache/conftool/dbconfig/20251022-123308-root.json
  • 12:32 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:32 jclark@cumin1003: START - Cookbook sre.hosts.provision for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:32 marostegui@cumin1003: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84257 and previous config saved to /var/cache/conftool/dbconfig/20251022-123213-root.json
  • 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84256 and previous config saved to /var/cache/conftool/dbconfig/20251022-122039-root.json
  • 12:19 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:19 jclark@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'db1263 (re)pooling @ 30%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84255 and previous config saved to /var/cache/conftool/dbconfig/20251022-121802-root.json
  • 12:17 marostegui@cumin1003: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84254 and previous config saved to /var/cache/conftool/dbconfig/20251022-121707-root.json
  • 12:11 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
  • 12:08 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1184 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84253 and previous config saved to /var/cache/conftool/dbconfig/20251022-120853-marostegui.json
  • 12:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1003: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84252 and previous config saved to /var/cache/conftool/dbconfig/20251022-120533-root.json
  • 12:03 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ssw1-d1-eqiad
  • 12:03 cmooney@cumin1003: START - Cookbook sre.hosts.remove-downtime for ssw1-d1-eqiad
  • 12:02 marostegui@cumin1003: dbctl commit (dc=all): 'db1263 (re)pooling @ 25%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84251 and previous config saved to /var/cache/conftool/dbconfig/20251022-120256-root.json
  • 11:50 marostegui@cumin1003: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84249 and previous config saved to /var/cache/conftool/dbconfig/20251022-115027-root.json
  • 11:48 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:47 marostegui@cumin1003: dbctl commit (dc=all): 'db1263 (re)pooling @ 20%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84248 and previous config saved to /var/cache/conftool/dbconfig/20251022-114749-root.json
  • 11:46 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:46 marostegui@cumin1003: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84247 and previous config saved to /var/cache/conftool/dbconfig/20251022-114629-root.json
  • 11:40 mvernon@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ms-be[1089-1090].eqiad.wmnet with reason: awaiting controller swap
  • 11:35 marostegui@cumin1003: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84246 and previous config saved to /var/cache/conftool/dbconfig/20251022-113521-root.json
  • 11:32 marostegui@cumin1003: dbctl commit (dc=all): 'db1263 (re)pooling @ 10%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84245 and previous config saved to /var/cache/conftool/dbconfig/20251022-113243-root.json
  • 11:31 marostegui@cumin1003: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84244 and previous config saved to /var/cache/conftool/dbconfig/20251022-113123-root.json
  • 11:30 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:30 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:30 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:29 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:28 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:27 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1196 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84243 and previous config saved to /var/cache/conftool/dbconfig/20251022-112732-marostegui.json
  • 11:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 11:26 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:26 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Upgrading
  • 11:25 cmooney@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1006
  • 11:25 cmooney@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1006
  • 11:25 dreamyjazz@deploy2002: Finished scap sync-world: Backport for EventStreamConfig: Don't collect user-agent for suggested_investigations_interaction (T404177) (duration: 08m 48s)
  • 11:24 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:24 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:20 dreamyjazz@deploy2002: kharlan, dreamyjazz: Continuing with sync
  • 11:20 dreamyjazz@deploy2002: kharlan, dreamyjazz: Backport for EventStreamConfig: Don't collect user-agent for suggested_investigations_interaction (T404177) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:20 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:19 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:18 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:17 marostegui@cumin1003: dbctl commit (dc=all): 'db1263 (re)pooling @ 7%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84242 and previous config saved to /var/cache/conftool/dbconfig/20251022-111736-root.json
  • 11:16 cmooney@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1006
  • 11:16 cmooney@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1006
  • 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84241 and previous config saved to /var/cache/conftool/dbconfig/20251022-111617-root.json
  • 11:16 dreamyjazz@deploy2002: Started scap sync-world: Backport for EventStreamConfig: Don't collect user-agent for suggested_investigations_interaction (T404177)
  • 11:16 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:15 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 11:15 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:15 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:15 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 11:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 11:14 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 11:09 kamila@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host wikikube-worker2203.codfw.wmnet
  • 11:08 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Fix abuse_filter_log index in TempUserIPLookup (T400280), Fix abuse_filter_log index in TempUserIPLookup (T400280) (duration: 10m 01s)
  • 11:08 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 11:07 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 11:06 kamila@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2203.codfw.wmnet
  • 11:05 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:05 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:04 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
  • 11:04 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
  • 11:04 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab replica
  • 11:03 dreamyjazz@deploy2002: dreamyjazz: Backport for Fix abuse_filter_log index in TempUserIPLookup (T400280), Fix abuse_filter_log index in TempUserIPLookup (T400280) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:02 marostegui@cumin1003: dbctl commit (dc=all): 'db1263 (re)pooling @ 5%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84240 and previous config saved to /var/cache/conftool/dbconfig/20251022-110230-root.json
  • 11:01 marostegui@cumin1003: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84239 and previous config saved to /var/cache/conftool/dbconfig/20251022-110111-root.json
  • 10:58 dreamyjazz@deploy2002: Started scap sync-world: Backport for Fix abuse_filter_log index in TempUserIPLookup (T400280), Fix abuse_filter_log index in TempUserIPLookup (T400280)
  • 10:54 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab replica
  • 10:54 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
  • 10:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2146 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84238 and previous config saved to /var/cache/conftool/dbconfig/20251022-105255-marostegui.json
  • 10:52 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 10:49 kamila@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T405631)
  • 10:48 kamila@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T405631)
  • 10:47 marostegui@cumin1003: dbctl commit (dc=all): 'db1263 (re)pooling @ 1%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84237 and previous config saved to /var/cache/conftool/dbconfig/20251022-104724-root.json
  • 10:46 marostegui@cumin1003: dbctl commit (dc=all): 'Decrease es2028 weight', diff saved to https://phabricator.wikimedia.org/P84236 and previous config saved to /var/cache/conftool/dbconfig/20251022-104601-marostegui.json
  • 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Decrease db1262 weight', diff saved to https://phabricator.wikimedia.org/P84235 and previous config saved to /var/cache/conftool/dbconfig/20251022-104530-marostegui.json
  • 10:44 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
  • 10:40 kamila@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T405631)
  • 10:39 kamila@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T405631)
  • 10:29 kamila@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic (T405631)
  • 10:27 kamila@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic (T405631)
  • 10:27 marostegui@cumin1003: dbctl commit (dc=all): 'db1251 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84234 and previous config saved to /var/cache/conftool/dbconfig/20251022-102732-root.json
  • 10:26 marostegui@cumin1003: dbctl commit (dc=all): 'db1262 (re)pooling @ 1000%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84233 and previous config saved to /var/cache/conftool/dbconfig/20251022-102609-root.json
  • 10:12 marostegui@cumin1003: dbctl commit (dc=all): 'db1251 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84232 and previous config saved to /var/cache/conftool/dbconfig/20251022-101225-root.json
  • 10:11 marostegui@cumin1003: dbctl commit (dc=all): 'db1262 (re)pooling @ 75%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84231 and previous config saved to /var/cache/conftool/dbconfig/20251022-101103-root.json
  • 10:09 marostegui@cumin1003: dbctl commit (dc=all): 'Add db1263 to dbctl depooled T406550', diff saved to https://phabricator.wikimedia.org/P84230 and previous config saved to /var/cache/conftool/dbconfig/20251022-100920-marostegui.json
  • 09:58 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:57 marostegui@cumin1003: dbctl commit (dc=all): 'db1251 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84229 and previous config saved to /var/cache/conftool/dbconfig/20251022-095719-root.json
  • 09:55 marostegui@cumin1003: dbctl commit (dc=all): 'db1262 (re)pooling @ 60%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84228 and previous config saved to /var/cache/conftool/dbconfig/20251022-095557-root.json
  • 09:55 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:52 cmooney@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1005
  • 09:51 cmooney@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1005
  • 09:50 marostegui: Stop mariadb on es1030 for decommissioning T407953
  • 09:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on es1030.eqiad.wmnet with reason: Decommissioning
  • 09:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2034 - Depool es2034.codfw.wmnet to then clone it to es2057.codfw.wmnet - fceratto@cumin1003
  • 09:48 cmooney@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1005
  • 09:48 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2034 - Depool es2034.codfw.wmnet to then clone it to es2057.codfw.wmnet - fceratto@cumin1003
  • 09:48 cmooney@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1005
  • 09:47 fceratto@cumin1003: START - Cookbook sre.mysql.clone_es of es2034.codfw.wmnet onto es2057.codfw.wmnet
  • 09:42 marostegui@cumin1003: dbctl commit (dc=all): 'db1251 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84225 and previous config saved to /var/cache/conftool/dbconfig/20251022-094213-root.json
  • 09:40 marostegui@cumin1003: dbctl commit (dc=all): 'db1262 (re)pooling @ 50%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84224 and previous config saved to /var/cache/conftool/dbconfig/20251022-094051-root.json
  • 09:34 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1251 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84223 and previous config saved to /var/cache/conftool/dbconfig/20251022-093413-marostegui.json
  • 09:34 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1251.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1030 from dbctl T407953', diff saved to https://phabricator.wikimedia.org/P84222 and previous config saved to /var/cache/conftool/dbconfig/20251022-092747-marostegui.json
  • 09:25 marostegui@cumin1003: dbctl commit (dc=all): 'db1262 (re)pooling @ 30%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84221 and previous config saved to /var/cache/conftool/dbconfig/20251022-092545-root.json
  • 09:25 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2057.codfw.wmnet with reason: Setting up new ES host
  • 09:10 marostegui@cumin1003: dbctl commit (dc=all): 'db1262 (re)pooling @ 25%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84220 and previous config saved to /var/cache/conftool/dbconfig/20251022-091039-root.json
  • 09:04 marostegui@cumin1003: dbctl commit (dc=all): 'Reduce weight for db2245 - which was wrong', diff saved to https://phabricator.wikimedia.org/P84219 and previous config saved to /var/cache/conftool/dbconfig/20251022-090437-marostegui.json
  • 08:55 marostegui@cumin1003: dbctl commit (dc=all): 'db1262 (re)pooling @ 20%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84218 and previous config saved to /var/cache/conftool/dbconfig/20251022-085533-root.json
  • 08:40 marostegui@cumin1003: dbctl commit (dc=all): 'db1262 (re)pooling @ 10%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84217 and previous config saved to /var/cache/conftool/dbconfig/20251022-084027-root.json
  • 08:31 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1030 T407953', diff saved to https://phabricator.wikimedia.org/P84216 and previous config saved to /var/cache/conftool/dbconfig/20251022-083153-marostegui.json
  • 08:31 marostegui@cumin1003: dbctl commit (dc=all): 'Promote es1053 to es2 primary as es1030 will be decommissioned T406690 T407953', diff saved to https://phabricator.wikimedia.org/P84215 and previous config saved to /var/cache/conftool/dbconfig/20251022-083134-marostegui.json
  • 08:25 marostegui@cumin1003: dbctl commit (dc=all): 'db1262 (re)pooling @ 7%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84214 and previous config saved to /var/cache/conftool/dbconfig/20251022-082521-root.json
  • 08:17 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit1003.wikimedia.org
  • 08:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts es1029.eqiad.wmnet
  • 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1029.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 08:13 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1029.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 08:10 marostegui@cumin1003: dbctl commit (dc=all): 'db1262 (re)pooling @ 5%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84213 and previous config saved to /var/cache/conftool/dbconfig/20251022-081014-root.json
  • 08:08 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 08:08 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host gerrit1003.wikimedia.org
  • 08:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1005
  • 08:02 cmooney@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1005
  • 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts es1029.eqiad.wmnet
  • 07:59 cmooney@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1005
  • 07:59 cmooney@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1005
  • 07:58 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest1005
  • 07:57 cmooney@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1005
  • 07:55 marostegui@cumin1003: dbctl commit (dc=all): 'db1262 (re)pooling @ 1%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84212 and previous config saved to /var/cache/conftool/dbconfig/20251022-075508-root.json
  • 07:52 marostegui@cumin1003: dbctl commit (dc=all): 'Add db1262 depooled T406550', diff saved to https://phabricator.wikimedia.org/P84211 and previous config saved to /var/cache/conftool/dbconfig/20251022-075234-marostegui.json
  • 04:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp3073.esams.wmnet
  • 04:37 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp3073.esams.wmnet
  • 04:37 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3073.esams.wmnet
  • 04:36 fabfur: repooling cp3073 after reboot and removing downtime (T407110)
  • 03:28 tstarling@deploy2002: Finished scap sync-world: Backport for recentchanges: Temporary fix for incubator exception (duration: 09m 38s)
  • 03:24 tstarling@deploy2002: tstarling: Continuing with sync
  • 03:23 tstarling@deploy2002: tstarling: Backport for recentchanges: Temporary fix for incubator exception synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 03:19 tstarling@deploy2002: Started scap sync-world: Backport for recentchanges: Temporary fix for incubator exception
  • 00:41 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 00:34 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on cp3073.esams.wmnet with reason: depooled
  • 00:22 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db-test1003.eqiad.wmnet with OS trixie
  • 00:16 sukhe: sudo ipmitool -I lanplus -H "cp3073.mgmt.esams.wmnet" -U root -E chassis power cycle
  • 00:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1011.eqiad.wmnet

2025-10-21

  • 23:56 zabe@deploy2002: Finished scap sync-world: Backport for Use rc_source instead of rc_type in ORES config (T74157) (duration: 09m 16s)
  • 23:52 zabe@deploy2002: zabe: Continuing with sync
  • 23:52 zabe@deploy2002: zabe: Backport for Use rc_source instead of rc_type in ORES config (T74157) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:47 zabe@deploy2002: Started scap sync-world: Backport for Use rc_source instead of rc_type in ORES config (T74157)
  • 23:45 TimStarling: on db2202 creating table watchlist_member_T406843 for T406843 performance investigation
  • 23:44 zabe@deploy2002: Finished scap sync-world: Backport for PS.php: Add analytics-web service (T309738) (duration: 09m 55s)
  • 23:39 zabe@deploy2002: zabe: Continuing with sync
  • 23:38 zabe@deploy2002: zabe: Backport for PS.php: Add analytics-web service (T309738) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:34 zabe@deploy2002: Started scap sync-world: Backport for PS.php: Add analytics-web service (T309738)
  • 23:16 sukhe@cumin1003: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on A:cp-text_esams and A:cp
  • 22:57 rzl@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 22:57 rzl@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 22:56 rzl@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 22:56 rzl@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 22:54 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 22:53 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 22:53 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 22:52 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 22:52 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 22:51 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 22:51 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 22:51 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 22:50 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 22:49 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 22:49 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 22:48 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 22:47 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 22:47 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/termbox: apply
  • 22:47 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 22:47 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 22:46 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 22:46 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 22:45 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 22:45 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 22:45 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
  • 22:45 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
  • 22:44 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:44 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:44 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 22:43 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 22:42 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 22:42 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply
  • 22:41 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 22:41 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 22:40 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 22:40 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:40 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:39 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 22:39 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 22:38 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 22:38 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 22:36 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 22:36 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 22:36 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 22:35 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 22:35 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 22:34 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 22:34 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 22:34 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 22:33 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 22:31 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: apply
  • 22:31 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: apply
  • 22:30 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 22:30 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 22:30 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 22:30 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 22:29 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 22:29 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 22:29 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 22:28 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 22:27 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 22:26 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 22:26 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 22:26 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 22:25 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 22:25 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 22:23 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 22:23 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 22:23 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 22:23 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 22:22 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 22:22 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 22:22 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 22:21 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 22:21 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 22:21 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 22:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 22:19 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 22:19 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 22:19 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 22:19 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 22:18 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 22:18 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 22:18 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 22:17 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
  • 22:17 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
  • 22:17 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 22:16 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 22:16 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:16 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:16 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 22:15 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 22:15 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 22:14 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 22:13 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 22:12 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 22:12 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 22:11 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 22:11 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 22:11 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 22:10 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 22:10 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 22:09 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 22:09 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 22:08 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 22:08 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 22:08 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 22:07 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 22:04 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 22:04 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 22:04 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 22:03 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_esams and A:cp
  • 22:03 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3081.esams.wmnet
  • 22:03 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 22:02 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 22:02 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 22:02 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 22:02 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 22:02 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 22:02 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 22:02 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
  • 22:01 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
  • 22:00 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:00 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 21:59 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 21:59 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 21:47 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1011.eqiad.wmnet
  • 21:25 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3072.esams.wmnet
  • 21:23 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3080.esams.wmnet
  • 21:13 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 21:00 amastilovic@deploy2002: Finished deploy [analytics/refinery@44eaef5] (thin): Regular analytics weekly train THIN [analytics/refinery@44eaef53] (duration: 01m 04s)
  • 20:59 amastilovic@deploy2002: Started deploy [analytics/refinery@44eaef5] (thin): Regular analytics weekly train THIN [analytics/refinery@44eaef53]
  • 20:59 amastilovic@deploy2002: Finished deploy [analytics/refinery@44eaef5]: Regular analytics weekly train [analytics/refinery@44eaef53] (duration: 02m 38s)
  • 20:56 amastilovic@deploy2002: Started deploy [analytics/refinery@44eaef5]: Regular analytics weekly train [analytics/refinery@44eaef53]
  • 20:56 amastilovic@deploy2002: Finished deploy [analytics/refinery@44eaef5] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@44eaef53] (duration: 00m 59s)
  • 20:55 amastilovic@deploy2002: Started deploy [analytics/refinery@44eaef5] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@44eaef53]
  • 20:54 amastilovic: Deploying Refinery at 44eaef / user_central_id and ja4h X-Analytics
  • 20:43 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3071.esams.wmnet
  • 20:41 arlolra@deploy2002: Finished scap sync-world: Backport for Do not insert empty document fragments as TOC lines (T407323) (duration: 10m 12s)
  • 20:40 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3079.esams.wmnet
  • 20:37 arlolra@deploy2002: arlolra: Continuing with sync
  • 20:35 arlolra@deploy2002: arlolra: Backport for Do not insert empty document fragments as TOC lines (T407323) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:31 arlolra@deploy2002: Started scap sync-world: Backport for Do not insert empty document fragments as TOC lines (T407323)
  • 20:18 derick@deploy2002: Finished scap sync-world: Backport for user: Log user ID and name when Setup isn't fully initialized (T406433), user: Log user ID and name when Setup isn't fully initialized (T406433) (duration: 08m 50s)
  • 20:14 derick@deploy2002: d3r1ck01, derick: Continuing with sync
  • 20:14 derick@deploy2002: d3r1ck01, derick: Backport for user: Log user ID and name when Setup isn't fully initialized (T406433), user: Log user ID and name when Setup isn't fully initialized (T406433) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:10 derick@deploy2002: Started scap sync-world: Backport for user: Log user ID and name when Setup isn't fully initialized (T406433), user: Log user ID and name when Setup isn't fully initialized (T406433)
  • 20:01 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 20:00 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3070.esams.wmnet
  • 20:00 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 19:59 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 19:59 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 19:58 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 19:57 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 19:57 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:57 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3078.esams.wmnet
  • 19:55 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:54 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 19:54 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 19:54 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 19:54 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 19:53 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 19:53 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 19:53 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 19:52 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 19:52 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 19:52 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 19:52 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 19:51 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 19:51 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 19:51 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 19:51 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 19:50 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 19:33 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 19:32 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 19:32 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 19:32 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 19:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 19:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 19:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 19:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 19:30 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 19:30 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 19:30 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 19:30 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 19:30 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 19:30 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 19:29 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 19:29 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 19:29 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 19:29 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 19:17 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3069.esams.wmnet
  • 19:14 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3077.esams.wmnet
  • 18:53 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.24 refs T405680
  • 18:34 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3068.esams.wmnet
  • 18:33 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3076.esams.wmnet
  • 18:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul2002.codfw.wmnet with OS trixie
  • 17:59 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 17:59 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 17:59 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
  • 17:58 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
  • 17:58 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 17:58 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 17:58 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:58 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:57 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 17:57 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 17:54 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2002.codfw.wmnet with reason: host reimage
  • 17:53 swfrench@deploy2002: Finished scap sync-world: Deploy mesh configuration change for T309738 (duration: 11m 50s)
  • 17:52 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3075.esams.wmnet
  • 17:52 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3067.esams.wmnet
  • 17:48 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul2002.codfw.wmnet with reason: host reimage
  • 17:46 swfrench@deploy2002: Started scap sync-world: Deploy mesh configuration change for T309738
  • 17:36 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqiad and A:cp
  • 17:36 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1114.eqiad.wmnet
  • 17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqiad and A:cp
  • 17:34 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1115.eqiad.wmnet
  • 17:32 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:32 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 17:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 17:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 17:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 17:30 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 17:30 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 17:27 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host zuul2002.codfw.wmnet with OS trixie
  • 17:26 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:25 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:20 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul1002.eqiad.wmnet with OS trixie
  • 17:20 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:19 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:18 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:18 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:17 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:17 cmooney@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:09 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3074.esams.wmnet
  • 17:09 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp3066.esams.wmnet
  • 17:03 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1002.eqiad.wmnet with reason: host reimage
  • 17:02 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:02 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:59 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:59 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:57 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 16:57 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul1002.eqiad.wmnet with reason: host reimage
  • 16:57 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 16:56 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:56 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_esams and A:cp
  • 16:56 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 16:56 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_esams and A:cp
  • 16:55 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:55 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:55 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 16:54 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 16:54 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:53 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 16:53 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1112.eqiad.wmnet
  • 16:53 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 16:52 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 16:52 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1113.eqiad.wmnet
  • 16:42 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host zuul1002.eqiad.wmnet with OS trixie
  • 16:41 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 16:40 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 16:40 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:39 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 16:39 cmooney@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 16:39 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:38 cmooney@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 16:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 16:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 16:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 16:36 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 16:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 16:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul2001.codfw.wmnet with OS trixie
  • 16:31 cmooney@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 16:30 cmooney@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 16:30 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 16:30 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 16:30 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:29 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 16:29 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:29 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:29 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 16:28 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 16:28 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:28 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 16:28 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 16:28 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 16:26 topranks: deploy new netbox-extras to support Nokia to Netbox hosts T405637
  • 16:24 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 16:23 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 16:16 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on P{sessionstore1*} and P{P:Cassandra}
  • 16:13 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2001.codfw.wmnet with reason: host reimage
  • 16:11 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1111.eqiad.wmnet
  • 16:11 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1110.eqiad.wmnet
  • 16:08 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul2001.codfw.wmnet with reason: host reimage
  • 15:53 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:53 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:53 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:52 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 15:52 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:52 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:52 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 15:51 rzl@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:51 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 15:51 rzl@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 15:51 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 15:51 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 15:50 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 15:50 rzl@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 15:48 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host zuul2001.codfw.wmnet with OS trixie
  • 15:45 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on P{sessionstore1*} and P{P:Cassandra}
  • 15:38 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2078.codfw.wmnet
  • 15:35 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on P{sessionstore2*} and P{P:Cassandra}
  • 15:30 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1109.eqiad.wmnet
  • 15:30 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1108.eqiad.wmnet
  • 15:28 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy: Improvements to known-client DSL and entity deletion - swfrench@cumin2002"
  • 15:28 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Improvements to known-client DSL and entity deletion - swfrench@cumin2002
  • 15:27 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Improvements to known-client DSL and entity deletion - swfrench@cumin2002
  • 15:27 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy: Improvements to known-client DSL and entity deletion - swfrench@cumin2002"
  • 15:19 claime: Fix for envoy x-request-id and tracing deployed in all envs in staging and mw-on-k8s prod - T407826
  • 15:17 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 15:17 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 15:17 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:16 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:16 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 15:16 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 15:16 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:15 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 15:15 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 15:15 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 15:15 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:15 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:15 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 15:14 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 15:14 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 15:14 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:14 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 15:13 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 15:13 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 15:13 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 15:13 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:13 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:12 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 15:12 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 15:11 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:11 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 15:11 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 15:10 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 15:10 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 15:10 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 15:09 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:09 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:09 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 15:09 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 15:08 mutante: rebooting phabricator prod server
  • 15:08 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:08 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:08 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 15:07 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 15:07 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 15:06 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 15:06 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 15:06 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 15:06 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 15:06 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 15:04 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on P{sessionstore2*} and P{P:Cassandra}
  • 15:04 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: apply
  • 15:04 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: apply
  • 15:04 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 15:04 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 15:03 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab1004.eqiad.wmnet with reason: reboot for kernel
  • 15:03 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 15:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 15:02 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 15:02 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:02 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 15:02 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 15:01 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 15:01 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 15:01 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 15:01 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:00 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 15:00 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:00 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:00 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:00 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:00 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 15:00 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 14:59 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 14:59 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 14:58 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 14:58 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 14:58 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 14:57 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 14:57 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 14:57 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 14:57 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 14:56 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 14:56 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 14:55 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 14:55 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 14:55 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 14:55 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 14:55 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 14:55 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:54 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:54 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sre1005 add dns entries - cmooney@cumin1003"
  • 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sre1005 add dns entries - cmooney@cumin1003"
  • 14:53 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 14:53 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:53 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:53 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 14:52 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 14:52 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Update sendVerifyEmailReminderNotification to use relative timestamp (T58074), Update sendVerifyEmailReminderNotification to use relative timestamp (T58074) (duration: 08m 52s)
  • 14:49 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1107.eqiad.wmnet
  • 14:49 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1106.eqiad.wmnet
  • 14:49 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 14:49 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 14:48 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:48 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:47 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
  • 14:47 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:47 dreamyjazz@deploy2002: dreamyjazz: Backport for Update sendVerifyEmailReminderNotification to use relative timestamp (T58074), Update sendVerifyEmailReminderNotification to use relative timestamp (T58074) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:47 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:43 dreamyjazz@deploy2002: Started scap sync-world: Backport for Update sendVerifyEmailReminderNotification to use relative timestamp (T58074), Update sendVerifyEmailReminderNotification to use relative timestamp (T58074)
  • 14:42 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:42 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:42 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 14:42 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 14:42 dancy@deploy2002: Pruned MediaWiki: 1.45.0-wmf.21 (duration: 02m 24s)
  • 14:22 eileen: civicrm upgraded from f58e11b7 to 5dfa45d3
  • 14:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 46997
  • 14:13 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 46997
  • 14:08 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1104.eqiad.wmnet
  • 14:08 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1105.eqiad.wmnet
  • 14:06 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp40[50-52].ulsfo.wmnet} and A:cp
  • 14:06 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4052.ulsfo.wmnet
  • 13:56 Lucas_WMDE: UTC afternoon backport+config window done
  • {{safesubst:SAL entry|1=13:53 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [igwiki] Create 'autopatrolled' and 'rollbacker' usergroups (T407439), Throttle exemption for Editathon by Wikimedistas en Cruce - 6/7 November 2025 (T407630), [specieswiki] Enable USERLANGUAGE magic word (T406583), [[gerrit:1197614|[hsbwiktionary] Enable importing from enwiktionary (T40771}}
  • 13:48 lucaswerkmeister-wmde@deploy2002: superpes, lucaswerkmeister-wmde: Continuing with sync
  • 13:45 Lucas_WMDE: (cont.) (T407713)]], [dawikisource] Enable RC Patrol (T407790) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • {{safesubst:SAL entry|1=13:45 lucaswerkmeister-wmde@deploy2002: superpes, lucaswerkmeister-wmde: Backport for [igwiki] Create 'autopatrolled' and 'rollbacker' usergroups (T407439), Throttle exemption for Editathon by Wikimedistas en Cruce - 6/7 November 2025 (T407630), [specieswiki] Enable USERLANGUAGE magic word (T406583), [[gerrit:1197614|[hsbwiktionary] Enable importing from enwiktionary}}
  • 13:44 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
  • 13:43 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
  • {{safesubst:SAL entry|1=13:41 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [igwiki] Create 'autopatrolled' and 'rollbacker' usergroups (T407439), Throttle exemption for Editathon by Wikimedistas en Cruce - 6/7 November 2025 (T407630), [specieswiki] Enable USERLANGUAGE magic word (T406583), [[gerrit:1197614|[hsbwiktionary] Enable importing from enwiktionary (T407713}}
  • 13:28 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1103.eqiad.wmnet
  • 13:27 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1102.eqiad.wmnet
  • 13:25 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4051.ulsfo.wmnet
  • 13:07 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
  • 13:06 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
  • 13:05 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 13:04 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 12:50 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Define CheckUser Suggested Investigations event stream (T404177), CheckUser UserInfoCard: Enable XTools menu link on SUL wikis (T406012) (duration: 11m 42s)
  • 12:47 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1101.eqiad.wmnet
  • 12:47 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp1100.eqiad.wmnet
  • 12:46 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
  • 12:42 dreamyjazz@deploy2002: dreamyjazz: Backport for Define CheckUser Suggested Investigations event stream (T404177), CheckUser UserInfoCard: Enable XTools menu link on SUL wikis (T406012) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:42 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4050.ulsfo.wmnet
  • 12:42 topranks: stopping netbox service on netbox-dev2003 to update db from live netbox
  • 12:38 dreamyjazz@deploy2002: Started scap sync-world: Backport for Define CheckUser Suggested Investigations event stream (T404177), CheckUser UserInfoCard: Enable XTools menu link on SUL wikis (T406012)
  • 12:37 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1029 from dbctl T407832', diff saved to https://phabricator.wikimedia.org/P84200 and previous config saved to /var/cache/conftool/dbconfig/20251021-123706-marostegui.json
  • 12:36 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad and A:cp
  • 12:36 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad and A:cp
  • 12:36 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 12:35 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 12:35 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:34 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
  • 12:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
  • 12:31 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp40[50-52].ulsfo.wmnet} and A:cp
  • 12:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 12:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 12:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 12:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 12:28 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 12:27 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 12:27 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 12:23 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 12:22 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 12:22 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 12:22 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:19 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:12 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:10 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:09 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 12:09 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 12:06 effie: restarted pybal on lvs2014*
  • 12:02 effie: restarted pybal on lvs1019*
  • 12:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 11:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 11:49 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2056 slowly with 10 steps - Pooling in new host
  • 11:49 marostegui@cumin1003: dbctl commit (dc=all): 'db2245 (re)pooling @ 1000%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84198 and previous config saved to /var/cache/conftool/dbconfig/20251021-114917-root.json
  • 11:47 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:47 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:47 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:42 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:42 effie: restarted pybal on lvs1020*,lvs2014*
  • 11:40 marostegui@cumin1003: dbctl commit (dc=all): 'Increase sretest2003 weight in es1 T407352', diff saved to https://phabricator.wikimedia.org/P84197 and previous config saved to /var/cache/conftool/dbconfig/20251021-114005-marostegui.json
  • 11:34 marostegui@cumin1003: dbctl commit (dc=all): 'db2245 (re)pooling @ 75%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84195 and previous config saved to /var/cache/conftool/dbconfig/20251021-113411-root.json
  • 11:32 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:31 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 11:31 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 11:30 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 11:30 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:29 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 11:19 marostegui@cumin1003: dbctl commit (dc=all): 'db2245 (re)pooling @ 60%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84193 and previous config saved to /var/cache/conftool/dbconfig/20251021-111905-root.json
  • 11:11 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 11:04 marostegui@cumin1003: dbctl commit (dc=all): 'db2245 (re)pooling @ 50%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84191 and previous config saved to /var/cache/conftool/dbconfig/20251021-110359-root.json
  • 11:01 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 11:00 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 10:48 marostegui@cumin1003: dbctl commit (dc=all): 'db2245 (re)pooling @ 30%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84189 and previous config saved to /var/cache/conftool/dbconfig/20251021-104853-root.json
  • 10:39 oblivian@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/jaeger: apply
  • 10:38 oblivian@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/jaeger: apply
  • 10:38 oblivian@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
  • 10:38 oblivian@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
  • 10:38 ladsgroup@deploy2002: Finished scap sync-world: Backport for api: Fix incorrect templatelinks query in ApiQueryInfo (T407842) (duration: 10m 07s)
  • 10:34 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 10:33 marostegui@cumin1003: dbctl commit (dc=all): 'db2245 (re)pooling @ 25%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84187 and previous config saved to /var/cache/conftool/dbconfig/20251021-103347-root.json
  • 10:32 ladsgroup@deploy2002: ladsgroup: Backport for api: Fix incorrect templatelinks query in ApiQueryInfo (T407842) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:28 ladsgroup@deploy2002: Started scap sync-world: Backport for api: Fix incorrect templatelinks query in ApiQueryInfo (T407842)
  • 10:27 oblivian@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
  • 10:27 oblivian@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
  • 10:26 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert "Implement new usage types for statement with qualifiers and references" (T401290 T407684 T407744) (duration: 09m 15s)
  • 10:22 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 10:22 marostegui@cumin1003: dbctl commit (dc=all): 'db1234 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84186 and previous config saved to /var/cache/conftool/dbconfig/20251021-102209-root.json
  • 10:21 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Revert "Implement new usage types for statement with qualifiers and references" (T401290 T407684 T407744) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:18 marostegui@cumin1003: dbctl commit (dc=all): 'db2245 (re)pooling @ 20%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84184 and previous config saved to /var/cache/conftool/dbconfig/20251021-101841-root.json
  • 10:17 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert "Implement new usage types for statement with qualifiers and references" (T401290 T407684 T407744)
  • 10:16 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 10:09 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:07 marostegui@cumin1003: dbctl commit (dc=all): 'db1234 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84183 and previous config saved to /var/cache/conftool/dbconfig/20251021-100703-root.json
  • 10:06 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 10:03 marostegui@cumin1003: dbctl commit (dc=all): 'db2245 (re)pooling @ 10%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84181 and previous config saved to /var/cache/conftool/dbconfig/20251021-100335-root.json
  • 10:01 marostegui@cumin1003: dbctl commit (dc=all): 'db2246 (re)pooling @ 100%: Pooling new host in s4', diff saved to https://phabricator.wikimedia.org/P84180 and previous config saved to /var/cache/conftool/dbconfig/20251021-100158-root.json
  • 09:51 marostegui@cumin1003: dbctl commit (dc=all): 'db1234 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84179 and previous config saved to /var/cache/conftool/dbconfig/20251021-095157-root.json
  • 09:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 09:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 09:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 09:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 09:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 09:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 09:48 marostegui@cumin1003: dbctl commit (dc=all): 'db2245 (re)pooling @ 7%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84177 and previous config saved to /var/cache/conftool/dbconfig/20251021-094829-root.json
  • 09:46 marostegui@cumin1003: dbctl commit (dc=all): 'db2246 (re)pooling @ 75%: Pooling new host in s4', diff saved to https://phabricator.wikimedia.org/P84176 and previous config saved to /var/cache/conftool/dbconfig/20251021-094652-root.json
  • 09:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 09:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:36 marostegui@cumin1003: dbctl commit (dc=all): 'db1234 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84175 and previous config saved to /var/cache/conftool/dbconfig/20251021-093652-root.json
  • 09:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:33 marostegui@cumin1003: dbctl commit (dc=all): 'db2245 (re)pooling @ 5%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84173 and previous config saved to /var/cache/conftool/dbconfig/20251021-093323-root.json
  • 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:31 marostegui@cumin1003: dbctl commit (dc=all): 'db2246 (re)pooling @ 60%: Pooling new host in s4', diff saved to https://phabricator.wikimedia.org/P84172 and previous config saved to /var/cache/conftool/dbconfig/20251021-093146-root.json
  • 09:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:29 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1234 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84171 and previous config saved to /var/cache/conftool/dbconfig/20251021-092911-marostegui.json
  • 09:29 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 09:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:23 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test1003.eqiad.wmnet with reason: host reimage
  • 09:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:18 marostegui@cumin1003: dbctl commit (dc=all): 'db2245 (re)pooling @ 1%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P84170 and previous config saved to /var/cache/conftool/dbconfig/20251021-091817-root.json
  • 09:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test1003.eqiad.wmnet with reason: host reimage
  • 09:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:16 marostegui@cumin1003: dbctl commit (dc=all): 'db2246 (re)pooling @ 50%: Pooling new host in s4', diff saved to https://phabricator.wikimedia.org/P84169 and previous config saved to /var/cache/conftool/dbconfig/20251021-091640-root.json
  • 09:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
  • 09:14 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2245 depooled T406551', diff saved to https://phabricator.wikimedia.org/P84168 and previous config saved to /var/cache/conftool/dbconfig/20251021-091418-marostegui.json
  • 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db-test1003.eqiad.wmnet with OS trixie
  • 09:03 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2056 slowly with 10 steps - Pooling in new host
  • 09:02 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2056.codfw.wmnet
  • 09:02 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2056.codfw.wmnet
  • 09:01 marostegui@cumin1003: dbctl commit (dc=all): 'db2246 (re)pooling @ 30%: Pooling new host in s4', diff saved to https://phabricator.wikimedia.org/P84167 and previous config saved to /var/cache/conftool/dbconfig/20251021-090134-root.json
  • 08:48 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 60 hosts with reason: downtime new nokia devices in case they alert during tests
  • 08:46 marostegui@cumin1003: dbctl commit (dc=all): 'db2246 (re)pooling @ 25%: Pooling new host in s4', diff saved to https://phabricator.wikimedia.org/P84166 and previous config saved to /var/cache/conftool/dbconfig/20251021-084628-root.json
  • 08:46 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 08:46 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "run sync to add new nokia switches - cmooney@cumin1003 - T405558"
  • 08:45 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 08:45 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "run sync to add new nokia switches - cmooney@cumin1003 - T405558"
  • 08:44 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:42 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 08:39 elukey: restart cfssl-multirootca on pki nodes to pick up new discovery settings (see https://gerrit.wikimedia.org/r/c/operations/puppet/+/1196920)
  • 08:32 marostegui@cumin1003: dbctl commit (dc=all): 'es2028 (re)pooling @ 1000%: Repooling', diff saved to https://phabricator.wikimedia.org/P84164 and previous config saved to /var/cache/conftool/dbconfig/20251021-083231-root.json
  • 08:31 marostegui@cumin1003: dbctl commit (dc=all): 'db2246 (re)pooling @ 20%: Pooling new host in s4', diff saved to https://phabricator.wikimedia.org/P84163 and previous config saved to /var/cache/conftool/dbconfig/20251021-083122-root.json
  • 08:17 marostegui@cumin1003: dbctl commit (dc=all): 'es2028 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P84162 and previous config saved to /var/cache/conftool/dbconfig/20251021-081725-root.json
  • 08:16 marostegui@cumin1003: dbctl commit (dc=all): 'Increase sretest2003 weight in es1 T407352', diff saved to https://phabricator.wikimedia.org/P84161 and previous config saved to /var/cache/conftool/dbconfig/20251021-081644-marostegui.json
  • 08:16 marostegui@cumin1003: dbctl commit (dc=all): 'db2246 (re)pooling @ 10%: Pooling new host in s4', diff saved to https://phabricator.wikimedia.org/P84160 and previous config saved to /var/cache/conftool/dbconfig/20251021-081616-root.json
  • 08:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2033.codfw.wmnet onto es2056.codfw.wmnet
  • 08:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2033 gradually with 4 steps - Pool es2033.codfw.wmnet in after cloning
  • 08:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 08:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 08:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:07 marostegui@cumin1003: dbctl commit (dc=all): 'Increase sretest2003 weight in es1 T407352', diff saved to https://phabricator.wikimedia.org/P84158 and previous config saved to /var/cache/conftool/dbconfig/20251021-080733-marostegui.json
  • 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:04 marostegui@cumin1003: dbctl commit (dc=all): 'Increase sretest2003 weight in es1 T407352', diff saved to https://phabricator.wikimedia.org/P84157 and previous config saved to /var/cache/conftool/dbconfig/20251021-080412-marostegui.json
  • 08:02 marostegui@cumin1003: dbctl commit (dc=all): 'es2028 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P84156 and previous config saved to /var/cache/conftool/dbconfig/20251021-080219-root.json
  • 08:01 marostegui@cumin1003: dbctl commit (dc=all): 'db2246 (re)pooling @ 7%: Pooling new host in s4', diff saved to https://phabricator.wikimedia.org/P84155 and previous config saved to /var/cache/conftool/dbconfig/20251021-080110-root.json
  • 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Pool sretest2003 with minimal weight T407352', diff saved to https://phabricator.wikimedia.org/P84154 and previous config saved to /var/cache/conftool/dbconfig/20251021-075741-marostegui.json
  • 07:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 07:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook: apply
  • 07:48 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:48 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 from sretest2003 - marostegui@cumin1003"
  • 07:47 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 from sretest2003 - marostegui@cumin1003"
  • 07:47 marostegui@cumin1003: dbctl commit (dc=all): 'es2028 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P84152 and previous config saved to /var/cache/conftool/dbconfig/20251021-074713-root.json
  • 07:46 marostegui@cumin1003: dbctl commit (dc=all): 'db2246 (re)pooling @ 5%: Pooling new host in s4', diff saved to https://phabricator.wikimedia.org/P84151 and previous config saved to /var/cache/conftool/dbconfig/20251021-074604-root.json
  • 07:43 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 07:42 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: prepare completion search with defaultsort A/B test (T404858) (duration: 09m 58s)
  • 07:39 marostegui@cumin1003: dbctl commit (dc=all): 'Promote es1052 to es1 master and depool es1029 T407832', diff saved to https://phabricator.wikimedia.org/P84149 and previous config saved to /var/cache/conftool/dbconfig/20251021-073904-marostegui.json
  • 07:38 dcausse@deploy2002: dcausse: Continuing with sync
  • 07:37 dcausse@deploy2002: dcausse: Backport for cirrus: prepare completion search with defaultsort A/B test (T404858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:32 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: prepare completion search with defaultsort A/B test (T404858)
  • 07:32 marostegui@cumin1003: dbctl commit (dc=all): 'es2028 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P84148 and previous config saved to /var/cache/conftool/dbconfig/20251021-073207-root.json
  • 07:24 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2033 gradually with 4 steps - Pool es2033.codfw.wmnet in after cloning
  • 07:22 esanders@deploy2002: Finished scap sync-world: Backport for Follow-up Iedb6361: Set insert-ignore on all insertSelect queries (T407357) (duration: 11m 45s)
  • 07:17 marostegui@cumin1003: dbctl commit (dc=all): 'es2028 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P84146 and previous config saved to /var/cache/conftool/dbconfig/20251021-071701-root.json
  • 07:17 marostegui@cumin1003: dbctl commit (dc=all): 'db1232 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84145 and previous config saved to /var/cache/conftool/dbconfig/20251021-071700-root.json
  • 07:16 marostegui@cumin1003: dbctl commit (dc=all): 'db2246 (re)pooling @ 1%: Pooling new host in s4', diff saved to https://phabricator.wikimedia.org/P84144 and previous config saved to /var/cache/conftool/dbconfig/20251021-071632-root.json
  • 07:16 esanders@deploy2002: esanders: Continuing with sync
  • 07:15 esanders@deploy2002: esanders: Backport for Follow-up Iedb6361: Set insert-ignore on all insertSelect queries (T407357) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:15 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2246 depooled T406551', diff saved to https://phabricator.wikimedia.org/P84143 and previous config saved to /var/cache/conftool/dbconfig/20251021-071503-marostegui.json
  • 07:10 esanders@deploy2002: Started scap sync-world: Backport for Follow-up Iedb6361: Set insert-ignore on all insertSelect queries (T407357)
  • 07:01 marostegui@cumin1003: dbctl commit (dc=all): 'es2028 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P84142 and previous config saved to /var/cache/conftool/dbconfig/20251021-070155-root.json
  • 07:01 marostegui@cumin1003: dbctl commit (dc=all): 'db1232 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84141 and previous config saved to /var/cache/conftool/dbconfig/20251021-070154-root.json
  • 06:53 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts es1028.eqiad.wmnet
  • 06:53 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:53 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1028.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 06:52 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1028.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 06:48 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 06:46 marostegui@cumin1003: dbctl commit (dc=all): 'es2028 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P84140 and previous config saved to /var/cache/conftool/dbconfig/20251021-064649-root.json
  • 06:46 marostegui@cumin1003: dbctl commit (dc=all): 'db1232 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84139 and previous config saved to /var/cache/conftool/dbconfig/20251021-064648-root.json
  • 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts es1028.eqiad.wmnet
  • 06:44 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts es1028.eqiad.wmnet
  • 06:44 marostegui@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 06:39 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 06:34 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts es1028.eqiad.wmnet
  • 06:32 marostegui: Add sretest2003 to dbctl depooled T407352
  • 06:31 marostegui@cumin1003: dbctl commit (dc=all): 'es2028 (re)pooling @ 7%: Repooling', diff saved to https://phabricator.wikimedia.org/P84138 and previous config saved to /var/cache/conftool/dbconfig/20251021-063143-root.json
  • 06:31 marostegui@cumin1003: dbctl commit (dc=all): 'db1232 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84137 and previous config saved to /var/cache/conftool/dbconfig/20251021-063142-root.json
  • 06:31 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1028 from dbctl T407720', diff saved to https://phabricator.wikimedia.org/P84136 and previous config saved to /var/cache/conftool/dbconfig/20251021-063134-marostegui.json
  • 06:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1232 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84135 and previous config saved to /var/cache/conftool/dbconfig/20251021-061748-marostegui.json
  • 06:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'es2028 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P84134 and previous config saved to /var/cache/conftool/dbconfig/20251021-061049-root.json
  • 05:55 marostegui@cumin1003: dbctl commit (dc=all): 'es2028 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P84133 and previous config saved to /var/cache/conftool/dbconfig/20251021-055543-root.json
  • 04:10 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.45.0-wmf.24 refs T405680 (duration: 67m 03s)
  • 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.45.0-wmf.24 refs T405680
  • 01:37 larssandergreen: Updating civicrm from 84867feb to f58e11b7
  • 00:33 sukhe@cumin1003: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-ulsfo and not P{cp4037*} and A:cp
  • 00:08 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4049.ulsfo.wmnet

2025-10-20

  • 23:27 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4048.ulsfo.wmnet
  • 23:08 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqsin and A:cp
  • 23:08 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5024.eqsin.wmnet
  • 23:05 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqsin and A:cp
  • 23:05 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5032.eqsin.wmnet
  • 22:44 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4047.ulsfo.wmnet
  • 22:25 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5023.eqsin.wmnet
  • 22:22 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5031.eqsin.wmnet
  • 22:22 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:cassandra-dev
  • 22:03 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4046.ulsfo.wmnet
  • 21:56 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
  • 21:42 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5022.eqsin.wmnet
  • 21:39 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5030.eqsin.wmnet
  • 21:34 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on P{aqs[1014-1022]*} and P{P:Cassandra}
  • 21:22 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4045.ulsfo.wmnet
  • 21:10 sbassett: Deployed security fix for T406639
  • 20:59 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5021.eqsin.wmnet
  • 20:56 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5029.eqsin.wmnet
  • 20:54 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:43 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:41 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4044.ulsfo.wmnet
  • 20:22 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul1001.eqiad.wmnet with OS trixie
  • 20:16 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5020.eqsin.wmnet
  • 20:13 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on sretest2001.codfw.wmnet with reason: T383173
  • 20:13 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5028.eqsin.wmnet
  • 19:58 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4043.ulsfo.wmnet
  • 19:56 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on P{aqs[1014-1022]*} and P{P:Cassandra}
  • 19:32 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5019.eqsin.wmnet
  • 19:30 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5027.eqsin.wmnet
  • 19:17 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4042.ulsfo.wmnet
  • 19:06 rzl@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 19:06 rzl@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 19:03 rzl@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:03 rzl@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 19:01 kemayo@deploy2002: Finished scap sync-world: Backport for Edit check: fix some eslint warnings (T407747) (duration: 08m 46s)
  • 18:57 kemayo@deploy2002: kemayo: Continuing with sync
  • 18:56 kemayo@deploy2002: kemayo: Backport for Edit check: fix some eslint warnings (T407747) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:52 kemayo@deploy2002: Started scap sync-world: Backport for Edit check: fix some eslint warnings (T407747)
  • 18:49 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5018.eqsin.wmnet
  • 18:47 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5026.eqsin.wmnet
  • 18:43 jgleeson: payments-wiki upgraded from 039e5a15 to a3017132
  • 18:40 jgleeson: civicrm upgraded from 7b70cb83 to 84867feb
  • 18:36 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4041.ulsfo.wmnet
  • 18:24 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: host reimage
  • 18:17 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul1001.eqiad.wmnet with reason: host reimage
  • 18:06 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5017.eqsin.wmnet
  • 18:06 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host zuul1001.eqiad.wmnet with OS trixie
  • 18:05 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:aqs-codfw
  • 18:04 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp5025.eqsin.wmnet
  • 17:54 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4040.ulsfo.wmnet
  • 17:52 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqsin and A:cp
  • 17:52 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqsin and A:cp
  • 17:42 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS trixie
  • 17:35 jgleeson: SmashPig upgraded from c76dede6 to aa45ee08
  • 17:24 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 17:19 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 17:13 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4039.ulsfo.wmnet
  • 16:48 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS trixie
  • 16:46 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6001.drmrs.wmnet with OS trixie
  • 16:32 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4038.ulsfo.wmnet
  • 16:29 Lucas_WMDE: UTC afternoon backport+config window (belatedly, more or less) done
  • 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 16:19 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 16:19 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert "Implement new usage types for statement with qualifiers and references" (T401290 T407684 T407744) (duration: 16m 43s)
  • 16:18 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-ulsfo and not P{cp4037*} and A:cp
  • 16:15 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 16:07 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Revert "Implement new usage types for statement with qualifiers and references" (T401290 T407684 T407744) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:02 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert "Implement new usage types for statement with qualifiers and references" (T401290 T407684 T407744)
  • 15:51 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS trixie
  • 15:50 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:aqs-codfw
  • 15:50 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: apply
  • 15:49 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: apply
  • 15:47 dancy@deploy2002: Installation of scap version "4.215.0" completed for 2 hosts
  • 15:46 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/kartotherian: apply
  • 15:46 dancy@deploy2002: Installing scap version "4.215.0" for 2 host(s)
  • 15:45 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/kartotherian: apply
  • 15:17 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/kartotherian: apply
  • 15:17 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/kartotherian: apply
  • 15:15 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: apply
  • 15:11 bking@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: apply
  • 15:11 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:11 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding franio2004 to codfw - jhancock@cumin1003"
  • 15:11 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding franio2004 to codfw - jhancock@cumin1003"
  • 15:08 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 14:55 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Restore "Implement new usage types for statement with qualifiers and references" (T401290 T407684 T407744) (duration: 08m 43s)
  • 14:51 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 14:51 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Restore "Implement new usage types for statement with qualifiers and references" (T401290 T407684 T407744) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:46 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Restore "Implement new usage types for statement with qualifiers and references" (T401290 T407684 T407744)
  • 14:36 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, neslihanturan: Continuing with sync
  • 14:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, neslihanturan: Backport for Revert "Implement new usage types for statement with qualifiers and references" (T401290 T407684 T407744) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:27 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert "Implement new usage types for statement with qualifiers and references" (T401290 T407684 T407744)
  • 14:09 vgutierrez: cleaning up IPVS leftovers from HTTPS migration of wdqs-internal services - T193473
  • 14:02 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable REST Sandbox on all wikis (T389409), Growth: remove no longer in use GENewcomerTasksStarterDifficultyEnabled (T396382), MetricsPlatform: Initialize $wgMetricsPlatformExperimentStreamNames (T406332), Enable Special:EditWatchlist pagination on beta (T41510) (duration
  • 13:56 lucaswerkmeister-wmde@deploy2002: sgimeno, bpirkle, phuedx, lucaswerkmeister-wmde, cparle: Continuing with sync
  • 13:55 topranks: enable 2x40G lag from asw2-c-eqiad to ssw1-dX-eqiad T405579
  • {{safesubst:SAL entry|1=13:51 lucaswerkmeister-wmde@deploy2002: sgimeno, bpirkle, phuedx, lucaswerkmeister-wmde, cparle: Backport for Enable REST Sandbox on all wikis (T389409), Growth: remove no longer in use GENewcomerTasksStarterDifficultyEnabled (T396382), MetricsPlatform: Initialize $wgMetricsPlatformExperimentStreamNames (T406332), [[gerrit:1196703|Enable Special:EditWatchlist paginati}}
  • 13:46 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable REST Sandbox on all wikis (T389409), Growth: remove no longer in use GENewcomerTasksStarterDifficultyEnabled (T396382), MetricsPlatform: Initialize $wgMetricsPlatformExperimentStreamNames (T406332), Enable Special:EditWatchlist pagination on beta (T41510)
  • 13:43 esanders@deploy2002: Finished scap sync-world: Backport for Follow-up I6698875: Set insert-ignore on all insert queries (T407357) (duration: 38m 36s)
  • 13:30 esanders@deploy2002: esanders: Continuing with sync
  • 13:30 esanders@deploy2002: esanders: Backport for Follow-up I6698875: Set insert-ignore on all insert queries (T407357) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:04 esanders@deploy2002: Started scap sync-world: Backport for Follow-up I6698875: Set insert-ignore on all insert queries (T407357)
  • 13:01 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2033 - Depool es2033.codfw.wmnet to then clone it to es2056.codfw.wmnet - fceratto@cumin1003
  • 13:01 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2033 - Depool es2033.codfw.wmnet to then clone it to es2056.codfw.wmnet - fceratto@cumin1003
  • 13:01 fceratto@cumin1003: START - Cookbook sre.mysql.clone_es of es2033.codfw.wmnet onto es2056.codfw.wmnet
  • 12:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2056.codfw.wmnet with reason: Setting up new ES host
  • 12:14 ozge@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:13 marostegui@cumin1003: dbctl commit (dc=all): 'db2247 (re)pooling @ 100%: Host provisioned T406551', diff saved to https://phabricator.wikimedia.org/P84128 and previous config saved to /var/cache/conftool/dbconfig/20251020-121311-root.json
  • 11:58 marostegui@cumin1003: dbctl commit (dc=all): 'db2247 (re)pooling @ 75%: Host provisioned T406551', diff saved to https://phabricator.wikimedia.org/P84127 and previous config saved to /var/cache/conftool/dbconfig/20251020-115805-root.json
  • 11:52 godog: add cloudcephosd1051 to the cluster via wmcs.ceph.osd.bootstrap_and_add - T405478
  • 11:43 marostegui@cumin1003: dbctl commit (dc=all): 'db1219 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84126 and previous config saved to /var/cache/conftool/dbconfig/20251020-114312-root.json
  • 11:43 marostegui@cumin1003: dbctl commit (dc=all): 'db2247 (re)pooling @ 60%: Host provisioned T406551', diff saved to https://phabricator.wikimedia.org/P84125 and previous config saved to /var/cache/conftool/dbconfig/20251020-114300-root.json
  • 11:28 marostegui@cumin1003: dbctl commit (dc=all): 'db1219 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84124 and previous config saved to /var/cache/conftool/dbconfig/20251020-112806-root.json
  • 11:27 marostegui@cumin1003: dbctl commit (dc=all): 'db2247 (re)pooling @ 50%: Host provisioned T406551', diff saved to https://phabricator.wikimedia.org/P84123 and previous config saved to /var/cache/conftool/dbconfig/20251020-112754-root.json
  • 11:21 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2055 gradually with 4 steps - Pooling in new host
  • 11:13 marostegui@cumin1003: dbctl commit (dc=all): 'db1219 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84121 and previous config saved to /var/cache/conftool/dbconfig/20251020-111300-root.json
  • 11:12 marostegui@cumin1003: dbctl commit (dc=all): 'db2247 (re)pooling @ 30%: Host provisioned T406551', diff saved to https://phabricator.wikimedia.org/P84120 and previous config saved to /var/cache/conftool/dbconfig/20251020-111248-root.json
  • 10:57 marostegui@cumin1003: dbctl commit (dc=all): 'db1219 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84118 and previous config saved to /var/cache/conftool/dbconfig/20251020-105754-root.json
  • 10:57 marostegui@cumin1003: dbctl commit (dc=all): 'db2247 (re)pooling @ 25%: Host provisioned T406551', diff saved to https://phabricator.wikimedia.org/P84117 and previous config saved to /var/cache/conftool/dbconfig/20251020-105742-root.json
  • 10:50 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1219 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84115 and previous config saved to /var/cache/conftool/dbconfig/20251020-105002-marostegui.json
  • 10:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1003: dbctl commit (dc=all): 'db2247 (re)pooling @ 20%: Host provisioned T406551', diff saved to https://phabricator.wikimedia.org/P84114 and previous config saved to /var/cache/conftool/dbconfig/20251020-104236-root.json
  • 10:27 marostegui@cumin1003: dbctl commit (dc=all): 'db2247 (re)pooling @ 10%: Host provisioned T406551', diff saved to https://phabricator.wikimedia.org/P84112 and previous config saved to /var/cache/conftool/dbconfig/20251020-102730-root.json
  • 10:12 marostegui@cumin1003: dbctl commit (dc=all): 'db2247 (re)pooling @ 7%: Host provisioned T406551', diff saved to https://phabricator.wikimedia.org/P84111 and previous config saved to /var/cache/conftool/dbconfig/20251020-101224-root.json
  • 10:10 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2055 gradually with 4 steps - Pooling in new host
  • 10:04 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2055.codfw.wmnet
  • 10:04 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2055.codfw.wmnet
  • 10:04 fceratto@cumin1003: dbctl commit (dc=all): 'Add es2055 T402859', diff saved to https://phabricator.wikimedia.org/P84110 and previous config saved to /var/cache/conftool/dbconfig/20251020-100419-fceratto.json
  • 09:57 marostegui@cumin1003: dbctl commit (dc=all): 'db2247 (re)pooling @ 5%: Host provisioned T406551', diff saved to https://phabricator.wikimedia.org/P84109 and previous config saved to /var/cache/conftool/dbconfig/20251020-095718-root.json
  • 09:51 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2032 gradually with 4 steps - Pooling in
  • 09:42 marostegui@cumin1003: dbctl commit (dc=all): 'db2247 (re)pooling @ 1%: Host provisioned T406551', diff saved to https://phabricator.wikimedia.org/P84107 and previous config saved to /var/cache/conftool/dbconfig/20251020-094212-root.json
  • 09:42 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2247 to dbctl T406551', diff saved to https://phabricator.wikimedia.org/P84106 and previous config saved to /var/cache/conftool/dbconfig/20251020-094207-marostegui.json
  • 09:12 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 09:07 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 09:07 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 09:06 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2032 gradually with 4 steps - Pooling in
  • 09:05 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2032.codfw.wmnet
  • 09:05 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2032.codfw.wmnet
  • 09:03 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on es2032.codfw.wmnet with reason: Cloning tool bug
  • 08:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es2028 to clone sretest2003', diff saved to https://phabricator.wikimedia.org/P84102 and previous config saved to /var/cache/conftool/dbconfig/20251020-084143-marostegui.json
  • 08:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2032 - Cloning issue
  • 08:37 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2032 - Cloning issue
  • 08:36 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.clone_es (exit_code=99) of es2032.codfw.wmnet onto es2055.codfw.wmnet
  • 08:34 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) es2032 gradually with 4 steps - Pool es2032.codfw.wmnet in after cloning
  • 08:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on es2028.codfw.wmnet,sretest2003.codfw.wmnet with reason: Cloning
  • 08:30 marostegui@cumin1003: dbctl commit (dc=all): 'db1261 (re)pooling @ 100%: Host provisioned T406550', diff saved to https://phabricator.wikimedia.org/P84100 and previous config saved to /var/cache/conftool/dbconfig/20251020-083043-root.json
  • 08:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2032 gradually with 4 steps - Pool es2032.codfw.wmnet in after cloning
  • 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2248.codfw.wmnet onto db2245.codfw.wmnet
  • 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2248 gradually with 4 steps - Pool db2248.codfw.wmnet in after cloning
  • 08:15 marostegui@cumin1003: dbctl commit (dc=all): 'db1261 (re)pooling @ 75%: Host provisioned T406550', diff saved to https://phabricator.wikimedia.org/P84097 and previous config saved to /var/cache/conftool/dbconfig/20251020-081537-root.json
  • 08:14 marostegui@cumin1003: dbctl commit (dc=all): 'db1218 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84096 and previous config saved to /var/cache/conftool/dbconfig/20251020-081458-root.json
  • 08:08 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1028 T407720', diff saved to https://phabricator.wikimedia.org/P84095 and previous config saved to /var/cache/conftool/dbconfig/20251020-080804-marostegui.json
  • 08:07 marostegui@cumin1003: dbctl commit (dc=all): 'Promote es1051 to es3 primary as es1028 will be decommissioned T406690 T407720', diff saved to https://phabricator.wikimedia.org/P84094 and previous config saved to /var/cache/conftool/dbconfig/20251020-080721-marostegui.json
  • 08:04 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 60051
  • 08:04 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 60051
  • 08:00 marostegui@cumin1003: dbctl commit (dc=all): 'db1261 (re)pooling @ 60%: Host provisioned T406550', diff saved to https://phabricator.wikimedia.org/P84092 and previous config saved to /var/cache/conftool/dbconfig/20251020-080031-root.json
  • 07:59 marostegui@cumin1003: dbctl commit (dc=all): 'db1218 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84091 and previous config saved to /var/cache/conftool/dbconfig/20251020-075952-root.json
  • 07:45 marostegui@cumin1003: dbctl commit (dc=all): 'db1261 (re)pooling @ 50%: Host provisioned T406550', diff saved to https://phabricator.wikimedia.org/P84089 and previous config saved to /var/cache/conftool/dbconfig/20251020-074525-root.json
  • 07:44 marostegui@cumin1003: dbctl commit (dc=all): 'db1218 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84088 and previous config saved to /var/cache/conftool/dbconfig/20251020-074446-root.json
  • 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2248 gradually with 4 steps - Pool db2248.codfw.wmnet in after cloning
  • 07:35 marostegui: Stop MariaDB on es2032 to clone sretest2003 T407352
  • 07:30 marostegui@cumin1003: dbctl commit (dc=all): 'db1261 (re)pooling @ 30%: Host provisioned T406550', diff saved to https://phabricator.wikimedia.org/P84085 and previous config saved to /var/cache/conftool/dbconfig/20251020-073019-root.json
  • 07:29 marostegui@cumin1003: dbctl commit (dc=all): 'db1218 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84084 and previous config saved to /var/cache/conftool/dbconfig/20251020-072939-root.json
  • 07:28 marostegui: Stop MariaDB on es2032 to clone sretest2003 T407472
  • 07:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on es2032.codfw.wmnet,sretest2003.codfw.wmnet with reason: Cloning
  • 07:24 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 07:23 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 07:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:21 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1218 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84083 and previous config saved to /var/cache/conftool/dbconfig/20251020-072153-marostegui.json
  • 07:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 07:15 marostegui@cumin1003: dbctl commit (dc=all): 'db1261 (re)pooling @ 25%: Host provisioned T406550', diff saved to https://phabricator.wikimedia.org/P84082 and previous config saved to /var/cache/conftool/dbconfig/20251020-071513-root.json
  • 07:00 marostegui@cumin1003: dbctl commit (dc=all): 'db1261 (re)pooling @ 20%: Host provisioned T406550', diff saved to https://phabricator.wikimedia.org/P84081 and previous config saved to /var/cache/conftool/dbconfig/20251020-070007-root.json
  • 06:45 marostegui@cumin1003: dbctl commit (dc=all): 'db1261 (re)pooling @ 10%: Host provisioned T406550', diff saved to https://phabricator.wikimedia.org/P84080 and previous config saved to /var/cache/conftool/dbconfig/20251020-064501-root.json
  • 06:29 marostegui@cumin1003: dbctl commit (dc=all): 'db1261 (re)pooling @ 7%: Host provisioned T406550', diff saved to https://phabricator.wikimedia.org/P84079 and previous config saved to /var/cache/conftool/dbconfig/20251020-062955-root.json
  • 06:14 marostegui@cumin1003: dbctl commit (dc=all): 'db1261 (re)pooling @ 5%: Host provisioned T406550', diff saved to https://phabricator.wikimedia.org/P84078 and previous config saved to /var/cache/conftool/dbconfig/20251020-061449-root.json
  • 06:09 marostegui@cumin1003: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84077 and previous config saved to /var/cache/conftool/dbconfig/20251020-060956-root.json
  • 05:59 marostegui@cumin1003: dbctl commit (dc=all): 'db1261 (re)pooling @ 1%: Host provisioned T406550', diff saved to https://phabricator.wikimedia.org/P84076 and previous config saved to /var/cache/conftool/dbconfig/20251020-055942-root.json
  • 05:59 marostegui@cumin1003: dbctl commit (dc=all): 'Add db1261 depooled T406550', diff saved to https://phabricator.wikimedia.org/P84075 and previous config saved to /var/cache/conftool/dbconfig/20251020-055859-marostegui.json
  • 05:54 marostegui@cumin1003: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84074 and previous config saved to /var/cache/conftool/dbconfig/20251020-055450-root.json
  • 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es1027.eqiad.wmnet
  • 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1027.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 05:43 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1027.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 05:39 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 05:39 marostegui@cumin1003: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84072 and previous config saved to /var/cache/conftool/dbconfig/20251020-053944-root.json
  • 05:34 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts es1027.eqiad.wmnet
  • 05:24 marostegui@cumin1003: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84071 and previous config saved to /var/cache/conftool/dbconfig/20251020-052438-root.json
  • 05:20 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1027 from dbctl T407595', diff saved to https://phabricator.wikimedia.org/P84070 and previous config saved to /var/cache/conftool/dbconfig/20251020-052057-marostegui.json
  • 05:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1206 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84069 and previous config saved to /var/cache/conftool/dbconfig/20251020-051712-marostegui.json
  • 05:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2248.codfw.wmnet onto db2245.codfw.wmnet
  • 05:04 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2248.codfw.wmnet onto db2245.codfw.wmnet
  • 05:04 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2248 - Depool db2248.codfw.wmnet to then clone it to db2245.codfw.wmnet - marostegui@cumin1003
  • 05:03 marostegui@cumin1003: START - Cookbook sre.mysql.depool db2248 - Depool db2248.codfw.wmnet to then clone it to db2245.codfw.wmnet - marostegui@cumin1003
  • 05:03 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2248.codfw.wmnet onto db2245.codfw.wmnet
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 52s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-19

  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 32s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-18

  • 08:45 brett@dns1004: END - running authdns-update
  • 08:44 brett@dns1004: START - running authdns-update
  • 08:25 brett@dns1004: END - running authdns-update
  • 08:23 brett@dns1004: START - running authdns-update

2025-10-17

  • 21:49 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2058']
  • 21:48 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 21:45 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2058']
  • 21:44 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 21:43 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 21:43 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 21:43 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 21:42 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 21:29 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 21:29 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 21:26 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 21:26 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 21:24 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 21:21 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2058']
  • 21:20 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 20:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 20:44 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 20:43 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 20:40 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 20:37 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS bookworm
  • 20:19 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 20:18 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS bookworm
  • 20:17 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 20:10 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 19:51 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1008-dev.eqiad.wmnet with OS trixie
  • 19:50 ejegg: donorwiki upgraded from 70a7050f to 039e5a15
  • 19:50 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:49 ejegg: payments-wiki upgraded from 70a7050f to 039e5a15
  • 19:11 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:11 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 18:47 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 18:45 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 17:09 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:08 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:09 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2058']
  • 16:01 jhathaway@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058']
  • 15:33 Dreamy_Jazz: Ran `mwscript-k8s --comment='First emails to users to get them to confirm their email address for T58074' extensions/WikimediaMaintenance/sendVerifyEmailReminderNotification.php --wiki=metawiki 20250917000000`
  • 13:09 vgutierrez: updating ca-certificates package on bookworm puppetservers
  • 13:01 marostegui@cumin1003: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84067 and previous config saved to /var/cache/conftool/dbconfig/20251017-130106-root.json
  • 12:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 12:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 12:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 12:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 12:46 marostegui@cumin1003: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84066 and previous config saved to /var/cache/conftool/dbconfig/20251017-124600-root.json
  • 12:30 marostegui@cumin1003: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84064 and previous config saved to /var/cache/conftool/dbconfig/20251017-123054-root.json
  • 12:15 marostegui@cumin1003: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84063 and previous config saved to /var/cache/conftool/dbconfig/20251017-121548-root.json
  • 12:07 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1195 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84062 and previous config saved to /var/cache/conftool/dbconfig/20251017-120737-marostegui.json
  • 12:07 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2248.codfw.wmnet onto db2246.codfw.wmnet
  • 11:38 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2248 gradually with 4 steps - Pool db2248.codfw.wmnet in after cloning
  • 11:11 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:06 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:06 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:52 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2248 gradually with 4 steps - Pool db2248.codfw.wmnet in after cloning
  • 10:44 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:43 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:36 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:35 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:35 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:34 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:08 eileen: civicrm upgraded from ab1d21dc to 7b70cb83
  • 10:05 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:05 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:03 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:03 topranks: un-draining Arelion 100G transport eqiad <-> codfw following carrier fibre fix and return to stability T407578
  • 10:03 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:02 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:02 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:37 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:36 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 08:47 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 08:46 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 08:19 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2248 - Depool db2248.codfw.wmnet to then clone it to db2246.codfw.wmnet - marostegui@cumin1003
  • 08:19 marostegui@cumin1003: START - Cookbook sre.mysql.depool db2248 - Depool db2248.codfw.wmnet to then clone it to db2246.codfw.wmnet - marostegui@cumin1003
  • 08:19 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2248.codfw.wmnet onto db2246.codfw.wmnet
  • 08:08 topranks: draining Arelion eqiad <-> codfw transport wiht OSPF metric and re-enabling port on cr1-eqiad
  • 08:04 fceratto@cumin1003: START - Cookbook sre.mysql.clone_es of es2032.codfw.wmnet onto es2055.codfw.wmnet
  • 07:42 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84056 and previous config saved to /var/cache/conftool/dbconfig/20251017-074221-root.json
  • 07:27 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84055 and previous config saved to /var/cache/conftool/dbconfig/20251017-072715-root.json
  • 07:12 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84054 and previous config saved to /var/cache/conftool/dbconfig/20251017-071209-root.json
  • 06:57 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84053 and previous config saved to /var/cache/conftool/dbconfig/20251017-065703-root.json
  • 06:41 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84052 and previous config saved to /var/cache/conftool/dbconfig/20251017-064157-root.json
  • 06:26 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84051 and previous config saved to /var/cache/conftool/dbconfig/20251017-062651-root.json
  • 06:11 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84050 and previous config saved to /var/cache/conftool/dbconfig/20251017-061145-root.json
  • 05:56 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84049 and previous config saved to /var/cache/conftool/dbconfig/20251017-055639-root.json
  • 05:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on es1027.eqiad.wmnet with reason: Cloning
  • 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1027 T407595', diff saved to https://phabricator.wikimedia.org/P84048 and previous config saved to /var/cache/conftool/dbconfig/20251017-054458-marostegui.json
  • 05:41 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84047 and previous config saved to /var/cache/conftool/dbconfig/20251017-054133-root.json
  • 05:26 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84046 and previous config saved to /var/cache/conftool/dbconfig/20251017-052627-root.json
  • 05:11 marostegui@cumin1003: dbctl commit (dc=all): 'es1056 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84045 and previous config saved to /var/cache/conftool/dbconfig/20251017-051121-root.json
  • 05:11 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1056 to dbctl T406488', diff saved to https://phabricator.wikimedia.org/P84044 and previous config saved to /var/cache/conftool/dbconfig/20251017-051114-marostegui.json
  • 01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 14m 04s)
  • 01:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-16

  • 23:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 23:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 23:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 23:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 23:19 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 23:18 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 23:18 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 23:17 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/termbox: apply
  • 23:17 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 23:16 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 23:15 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 23:15 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 23:13 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 23:13 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 23:12 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:11 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:11 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 23:11 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 23:10 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 23:10 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 23:09 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 23:09 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 23:08 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 23:07 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 23:07 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
  • 23:06 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
  • 23:06 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 23:05 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 23:05 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 23:03 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply
  • 23:03 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 23:03 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 23:02 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 22:59 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 22:59 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 22:58 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 22:58 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 22:57 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 22:55 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 22:49 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 22:49 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 22:48 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 22:47 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 22:47 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 22:46 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 22:46 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 22:44 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 22:44 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 22:43 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 22:42 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 22:41 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 22:41 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 22:40 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 22:39 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 22:39 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 22:38 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 22:38 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 22:37 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 22:37 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 22:37 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 22:36 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 22:36 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 22:36 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 22:36 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 22:35 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 22:34 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 22:33 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 22:33 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 22:33 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 22:32 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 22:32 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 22:31 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 22:31 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
  • 22:31 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
  • 22:30 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:29 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:29 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 22:28 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 22:25 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 22:24 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 22:24 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 22:23 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 22:19 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 22:17 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 22:16 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 22:04 sbassett: Deployed security fix for T407131
  • 21:46 jdlrobson@deploy2002: Finished scap sync-world: Backport for Temporary user banner should not have such a high z-index (T407549) (duration: 15m 21s)
  • 21:42 jdlrobson@deploy2002: jdlrobson: Continuing with sync
  • 21:35 jdlrobson@deploy2002: jdlrobson: Backport for Temporary user banner should not have such a high z-index (T407549) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:31 jdlrobson@deploy2002: Started scap sync-world: Backport for Temporary user banner should not have such a high z-index (T407549)
  • 21:26 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7004.*
  • 21:23 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 21:20 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7004.magru.wmnet} and A:cp
  • 21:20 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7004.magru.wmnet
  • 21:08 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7004.magru.wmnet} and A:cp
  • 21:00 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7004.magru.wmnet with reason: Debugging sre.cdn.roll-reboot bugs
  • 20:59 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7004.*
  • 20:56 bblack: see also https://phabricator.wikimedia.org/T407578 for above port disables
  • 20:51 bblack: disabling cr1-eqiad:et-1/1/2 and cr1-codfw:et-1/0/2 (both ends of same Arelion transport, been erroring/flapping for a while)
  • 20:50 eileen: civicrm upgraded from ac4c185b to ab1d21dc
  • 20:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 20:43 ebernhardson@deploy2002: Finished scap sync-world: Backport for Add wgSitename for azwiktionary (T407358) (duration: 09m 29s)
  • 20:40 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
  • 20:38 ebernhardson@deploy2002: ebernhardson, nmw03: Continuing with sync
  • 20:38 ebernhardson@deploy2002: ebernhardson, nmw03: Backport for Add wgSitename for azwiktionary (T407358) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:33 ebernhardson@deploy2002: Started scap sync-world: Backport for Add wgSitename for azwiktionary (T407358)
  • 20:30 ebernhardson@deploy2002: Finished scap sync-world: Backport for Create "autopatrolled" user group on Danish Wikisource (T407281) (duration: 10m 57s)
  • 20:26 ebernhardson@deploy2002: ebernhardson, hamishz: Continuing with sync
  • 20:24 ebernhardson@deploy2002: ebernhardson, hamishz: Backport for Create "autopatrolled" user group on Danish Wikisource (T407281) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:20 ebernhardson@deploy2002: Started scap sync-world: Backport for Create "autopatrolled" user group on Danish Wikisource (T407281)
  • 20:19 ejegg: fundraising python tools upgraded from 698309f1 to 3b0b3fc0
  • 20:19 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 20:18 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 20:15 ebernhardson@deploy2002: Finished scap sync-world: Backport for Revert "cirrus: Start AB test of did-you-mean profiles" (T390858) (duration: 09m 36s)
  • 20:11 ebernhardson@deploy2002: ebernhardson: Continuing with sync
  • 20:10 ebernhardson@deploy2002: ebernhardson: Backport for Revert "cirrus: Start AB test of did-you-mean profiles" (T390858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:06 ebernhardson@deploy2002: Started scap sync-world: Backport for Revert "cirrus: Start AB test of did-you-mean profiles" (T390858)
  • 19:51 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:38 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:38 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 19:25 dancy: dancy@deploy2002 Installation of scap version "4.214.0" completed for 2 hosts
  • 19:22 dancy@deploy2002: Installing scap version "4.214.0" for 2 host(s)
  • 19:03 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 18:57 andrew@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1002-dev.eqiad.wmnet
  • 18:44 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on gerrit2003.wikimedia.org with reason: no active host - disabled
  • 18:42 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.clone_es (exit_code=99) of es2032.codfw.wmnet onto es2055.codfw.wmnet
  • 18:26 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS trixie
  • 18:26 brett@dns1004: END - running authdns-update
  • 18:25 brett@dns1004: START - running authdns-update
  • 18:08 brett: Import varnish 7.1.1-2~bpo13+wmf1 into trixie-wikimedia - T401832
  • 17:54 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2003.codfw.wmnet with OS bookworm
  • 17:38 swfrench@deploy2002: Finished scap sync-world: New PHP 8.3 production image (duration: 27m 32s)
  • 17:28 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 17:24 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 17:17 mforns@deploy2002: Finished deploy [analytics/refinery@6b7edca] (thin): Regular analytics weekly train THIN [analytics/refinery@6b7edcac] (duration: 01m 29s)
  • 17:16 mforns@deploy2002: Started deploy [analytics/refinery@6b7edca] (thin): Regular analytics weekly train THIN [analytics/refinery@6b7edcac]
  • 17:16 mforns@deploy2002: Finished deploy [analytics/refinery@6b7edca]: Regular analytics weekly train [analytics/refinery@6b7edcac] (duration: 06m 48s)
  • 17:12 swfrench@deploy2002: Started scap sync-world: New PHP 8.3 production image
  • 17:10 topranks: re-enable BGP sessions for lvs1018 on cr1-eqiad, cr2-eqiad after maintenance on the lvs host T405499
  • 17:09 mforns@deploy2002: Started deploy [analytics/refinery@6b7edca]: Regular analytics weekly train [analytics/refinery@6b7edcac]
  • 17:06 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
  • 17:00 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 16:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
  • 16:58 mforns@deploy2002: Finished deploy [analytics/refinery@6b7edca] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6b7edcac] (duration: 01m 16s)
  • 16:57 mforns@deploy2002: Started deploy [analytics/refinery@6b7edca] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6b7edcac]
  • 16:56 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
  • 16:46 swfrench-wmf: reprepro include php8.3_8.3.26-1+wmf11u2 in component/php83
  • 16:34 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 16:20 topranks: disable BGP sessions for lvs1018 on cr1-eqiad, cr2-eqiad to move traffic to backup load-balancer lvs1020 T405499
  • 16:19 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1018.eqiad.wmnet with reason: remove lvs1018 enp94s0f0np0 link to rack E1
  • 16:14 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2032 - Depool es2032.codfw.wmnet to then clone it to es2055.codfw.wmnet - fceratto@cumin1003
  • 16:13 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2032 - Depool es2032.codfw.wmnet to then clone it to es2055.codfw.wmnet - fceratto@cumin1003
  • 16:13 fceratto@cumin1003: START - Cookbook sre.mysql.clone_es of es2032.codfw.wmnet onto es2055.codfw.wmnet
  • 15:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2055.codfw.wmnet with reason: Setting up new ES host
  • 15:33 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp700[7-8].magru.wmnet [reason: pool after firmware updated]
  • 15:27 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7008.magru.wmnet
  • 15:27 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for cp7008.magru.wmnet
  • 15:20 jhancock@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7008']
  • 15:15 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7008.magru.wmnet with reason: firmware upgrade
  • 15:10 jhancock@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7008']
  • 15:10 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7007.magru.wmnet
  • 15:10 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for cp7007.magru.wmnet
  • 15:10 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7008.magru.wmnet [reason: updating firmware]
  • 15:03 jhancock@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7007']
  • 14:54 jhancock@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7007']
  • 14:51 ejegg: donorwiki upgraded from d903982c to 70a7050f
  • 14:37 moritzm: installing libarchive security updates
  • 14:33 urandom: starting `removenode` of aqs1012-b (id=bc700f01-8120-4d77-908f-eea943470a25)— T407414
  • 14:30 moritzm: installing distro-info-data updates on Bookworm
  • 14:27 urandom: starting `removenode` of aqs1012-a (id=0b0f0cd5-a1f8-44e2-a8e2-75800ebaea80) — T407414
  • 14:17 tappof: bump space for prometheus k8s-dse in eqiad
  • 14:09 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7008*} and A:cp
  • 14:09 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7008.magru.wmnet
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.pki.restart-reboot (exit_code=0) rolling reboot on A:pki
  • 13:59 sukhe: sudo ipmitool -I lanplus -H "cp7008.mgmt.magru.wmnet" -U root -E chassis power cycle
  • 13:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1260.eqiad.wmnet onto db1263.eqiad.wmnet
  • 13:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
  • 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
  • 13:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
  • 13:53 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
  • 13:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
  • 13:49 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
  • 13:28 zabe@deploy2002: Finished scap sync-world: Backport for BETA: Try using Hadoop QueryPage computations (T309738) (duration: 08m 09s)
  • 13:27 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7008*} and A:cp
  • 13:24 zabe@deploy2002: zabe: Continuing with sync
  • 13:22 zabe@deploy2002: zabe: Backport for BETA: Try using Hadoop QueryPage computations (T309738) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:20 zabe@deploy2002: Started scap sync-world: Backport for BETA: Try using Hadoop QueryPage computations (T309738)
  • 13:13 esanders@deploy2002: Finished scap sync-world: Backport for LQT convert: Ignore duplicate key insert errors when command line flag set (T407357) (duration: 10m 14s)
  • 13:12 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
  • 13:09 esanders@deploy2002: esanders: Continuing with sync
  • 13:06 esanders@deploy2002: esanders: Backport for LQT convert: Ignore duplicate key insert errors when command line flag set (T407357) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:03 esanders@deploy2002: Started scap sync-world: Backport for LQT convert: Ignore duplicate key insert errors when command line flag set (T407357)
  • 12:51 moritzm: installing git security updates
  • 12:36 moritzm: installing gst-plugins-base1.0 security updates
  • 12:13 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2054 slowly with 10 steps - Pooling in new host
  • 12:05 jmm@dns1004: END - running authdns-update
  • 12:03 jmm@dns1004: START - running authdns-update
  • 11:54 ozge@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:26 claime: sudo cumin 'A:cp' "enable-puppet 'Deploying gateway-check.lua changes - T406599 - cgoubert'
  • 11:22 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:21 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:21 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:21 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:19 hnowlan@deploy2002: Finished deploy [restbase/deploy@0be0059]: deploy 9 new wikis from r/1177553 (duration: 27m 01s)
  • 11:12 moritzm: installing Squid security updates
  • 11:08 claime: sudo cumin 'A:cp' "disable-puppet 'Deploying gateway-check.lua changes - T406599 - cgoubert'"
  • 11:05 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 11:04 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 11:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:53 hnowlan@deploy2002: Started deploy [restbase/deploy@0be0059]: deploy 9 new wikis from r/1177553
  • 10:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:21 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84027 and previous config saved to /var/cache/conftool/dbconfig/20251016-102110-root.json
  • 10:15 moritzm: installing libfcgi security updates
  • 10:06 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84025 and previous config saved to /var/cache/conftool/dbconfig/20251016-100605-root.json
  • 09:57 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2054 slowly with 10 steps - Pooling in new host
  • 09:56 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2054.codfw.wmnet
  • 09:56 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2054.codfw.wmnet
  • 09:55 fceratto@cumin1003: dbctl commit (dc=all): 'Add es2054 T402859', diff saved to https://phabricator.wikimedia.org/P84023 and previous config saved to /var/cache/conftool/dbconfig/20251016-095534-fceratto.json
  • 09:51 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84022 and previous config saved to /var/cache/conftool/dbconfig/20251016-095058-root.json
  • 09:35 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84021 and previous config saved to /var/cache/conftool/dbconfig/20251016-093553-root.json
  • 09:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1260 - Depool db1260.eqiad.wmnet to then clone it to db1263.eqiad.wmnet - marostegui@cumin1003
  • 09:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool db1260 - Depool db1260.eqiad.wmnet to then clone it to db1263.eqiad.wmnet - marostegui@cumin1003
  • 09:30 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1260.eqiad.wmnet onto db1263.eqiad.wmnet
  • 09:20 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84019 and previous config saved to /var/cache/conftool/dbconfig/20251016-092047-root.json
  • 09:14 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ssw1-d1-eqiad.mgmt with reason: downtime ssw1-d1-eqiad until we have the monitoring checks fully working for the new platform
  • 09:13 marostegui@cumin1003: dbctl commit (dc=all): 'db1235 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84018 and previous config saved to /var/cache/conftool/dbconfig/20251016-091343-root.json
  • 09:05 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84017 and previous config saved to /var/cache/conftool/dbconfig/20251016-090541-root.json
  • 09:02 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:00 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gerrit2002.wikimedia.org with reason: T407110
  • 09:00 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'db1235 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84016 and previous config saved to /var/cache/conftool/dbconfig/20251016-085837-root.json
  • 08:57 cmooney@dns2005: END - running authdns-update
  • 08:56 cmooney@dns2005: START - running authdns-update
  • 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1260.eqiad.wmnet onto db1262.eqiad.wmnet
  • 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
  • 08:50 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84014 and previous config saved to /var/cache/conftool/dbconfig/20251016-085035-root.json
  • 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'db1235 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84013 and previous config saved to /var/cache/conftool/dbconfig/20251016-084331-root.json
  • 08:36 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts es1026.eqiad.wmnet
  • 08:36 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:36 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1026.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 08:35 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: es1026.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 08:35 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84011 and previous config saved to /var/cache/conftool/dbconfig/20251016-083529-root.json
  • 08:32 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 08:32 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.23 refs T405679
  • 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'db1235 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P84010 and previous config saved to /var/cache/conftool/dbconfig/20251016-082825-root.json
  • 08:26 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts es1026.eqiad.wmnet
  • 08:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 08:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 08:22 marostegui@cumin1003: dbctl commit (dc=all): 'db2188 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P84009 and previous config saved to /var/cache/conftool/dbconfig/20251016-082237-root.json
  • 08:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1235 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P84007 and previous config saved to /var/cache/conftool/dbconfig/20251016-082031-marostegui.json
  • 08:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 08:20 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84006 and previous config saved to /var/cache/conftool/dbconfig/20251016-082023-root.json
  • 08:15 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 08:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 08:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 08:09 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1026 from dbctl T407351', diff saved to https://phabricator.wikimedia.org/P84005 and previous config saved to /var/cache/conftool/dbconfig/20251016-080948-marostegui.json
  • 08:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 08:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 08:07 marostegui@cumin1003: dbctl commit (dc=all): 'db2188 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P84004 and previous config saved to /var/cache/conftool/dbconfig/20251016-080731-root.json
  • 08:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
  • 08:05 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P84002 and previous config saved to /var/cache/conftool/dbconfig/20251016-080518-root.json
  • 08:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2033.codfw.wmnet onto es2054.codfw.wmnet
  • 08:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2033 gradually with 4 steps - Pool es2033.codfw.wmnet in after cloning
  • 07:55 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 264936
  • 07:54 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 264936
  • 07:52 marostegui@cumin1003: dbctl commit (dc=all): 'db2188 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P84000 and previous config saved to /var/cache/conftool/dbconfig/20251016-075225-root.json
  • 07:50 marostegui@cumin1003: dbctl commit (dc=all): 'es1055 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83999 and previous config saved to /var/cache/conftool/dbconfig/20251016-075012-root.json
  • 07:41 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 100%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83997 and previous config saved to /var/cache/conftool/dbconfig/20251016-074122-root.json
  • 07:41 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1055 to dbctl depooled T406488', diff saved to https://phabricator.wikimedia.org/P83996 and previous config saved to /var/cache/conftool/dbconfig/20251016-074118-marostegui.json
  • 07:37 marostegui@cumin1003: dbctl commit (dc=all): 'db2188 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83995 and previous config saved to /var/cache/conftool/dbconfig/20251016-073719-root.json
  • 07:29 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2188 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83992 and previous config saved to /var/cache/conftool/dbconfig/20251016-072932-marostegui.json
  • 07:29 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 07:26 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 75%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83991 and previous config saved to /var/cache/conftool/dbconfig/20251016-072610-root.json
  • 07:18 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2033 gradually with 4 steps - Pool es2033.codfw.wmnet in after cloning
  • 07:11 marostegui@cumin1003: dbctl commit (dc=all): 'db2145 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83989 and previous config saved to /var/cache/conftool/dbconfig/20251016-071136-root.json
  • 07:11 kostajh: UTC morning deploys done
  • 07:11 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 60%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83988 and previous config saved to /var/cache/conftool/dbconfig/20251016-071104-root.json
  • 07:09 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83987 and previous config saved to /var/cache/conftool/dbconfig/20251016-070916-root.json
  • 06:56 marostegui@cumin1003: dbctl commit (dc=all): 'db2145 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83986 and previous config saved to /var/cache/conftool/dbconfig/20251016-065630-root.json
  • 06:56 marostegui@cumin1003: dbctl commit (dc=all): 'db1186 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83985 and previous config saved to /var/cache/conftool/dbconfig/20251016-065612-root.json
  • 06:55 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 50%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83984 and previous config saved to /var/cache/conftool/dbconfig/20251016-065558-root.json
  • 06:54 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83983 and previous config saved to /var/cache/conftool/dbconfig/20251016-065410-root.json
  • 06:41 marostegui@cumin1003: dbctl commit (dc=all): 'db2145 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83982 and previous config saved to /var/cache/conftool/dbconfig/20251016-064124-root.json
  • 06:41 marostegui@cumin1003: dbctl commit (dc=all): 'db1186 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83981 and previous config saved to /var/cache/conftool/dbconfig/20251016-064106-root.json
  • 06:40 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 30%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83980 and previous config saved to /var/cache/conftool/dbconfig/20251016-064052-root.json
  • 06:39 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83979 and previous config saved to /var/cache/conftool/dbconfig/20251016-063904-root.json
  • 06:26 marostegui@cumin1003: dbctl commit (dc=all): 'db2145 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83978 and previous config saved to /var/cache/conftool/dbconfig/20251016-062618-root.json
  • 06:26 marostegui@cumin1003: dbctl commit (dc=all): 'db1186 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83977 and previous config saved to /var/cache/conftool/dbconfig/20251016-062600-root.json
  • 06:25 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 25%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83976 and previous config saved to /var/cache/conftool/dbconfig/20251016-062546-root.json
  • 06:24 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83975 and previous config saved to /var/cache/conftool/dbconfig/20251016-062358-root.json
  • 06:18 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2145 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83974 and previous config saved to /var/cache/conftool/dbconfig/20251016-061818-marostegui.json
  • 06:18 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'db1186 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83973 and previous config saved to /var/cache/conftool/dbconfig/20251016-061054-root.json
  • 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 20%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83972 and previous config saved to /var/cache/conftool/dbconfig/20251016-061040-root.json
  • 06:08 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83971 and previous config saved to /var/cache/conftool/dbconfig/20251016-060852-root.json
  • 06:03 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1186 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83970 and previous config saved to /var/cache/conftool/dbconfig/20251016-060300-marostegui.json
  • 06:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 05:55 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 10%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83969 and previous config saved to /var/cache/conftool/dbconfig/20251016-055534-root.json
  • 05:53 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83968 and previous config saved to /var/cache/conftool/dbconfig/20251016-055346-root.json
  • 05:51 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bookworm
  • 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'db2240 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83967 and previous config saved to /var/cache/conftool/dbconfig/20251016-054504-root.json
  • 05:40 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 7%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83965 and previous config saved to /var/cache/conftool/dbconfig/20251016-054027-root.json
  • 05:38 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83964 and previous config saved to /var/cache/conftool/dbconfig/20251016-053840-root.json
  • 05:29 marostegui@cumin1003: dbctl commit (dc=all): 'db2240 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83963 and previous config saved to /var/cache/conftool/dbconfig/20251016-052958-root.json
  • 05:25 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 5%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83962 and previous config saved to /var/cache/conftool/dbconfig/20251016-052521-root.json
  • 05:23 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83961 and previous config saved to /var/cache/conftool/dbconfig/20251016-052335-root.json
  • 05:14 marostegui@cumin1003: dbctl commit (dc=all): 'db2240 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83960 and previous config saved to /var/cache/conftool/dbconfig/20251016-051452-root.json
  • 05:10 marostegui@cumin1003: dbctl commit (dc=all): 'db2248 (re)pooling @ 1%: Pooling 1P host in s4', diff saved to https://phabricator.wikimedia.org/P83959 and previous config saved to /var/cache/conftool/dbconfig/20251016-051015-root.json
  • 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2248 to dbctl depooled T406551', diff saved to https://phabricator.wikimedia.org/P83958 and previous config saved to /var/cache/conftool/dbconfig/20251016-050917-marostegui.json
  • 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83957 and previous config saved to /var/cache/conftool/dbconfig/20251016-050829-root.json
  • 04:59 marostegui@cumin1003: dbctl commit (dc=all): 'db2240 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83956 and previous config saved to /var/cache/conftool/dbconfig/20251016-045946-root.json
  • 04:58 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
  • 04:53 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83955 and previous config saved to /var/cache/conftool/dbconfig/20251016-045323-root.json
  • 04:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2240.codfw.wmnet with reason: Maintenance
  • 04:47 marostegui@dns1006: END - running authdns-update
  • 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2240 T407177', diff saved to https://phabricator.wikimedia.org/P83954 and previous config saved to /var/cache/conftool/dbconfig/20251016-044650-marostegui.json
  • 04:46 marostegui@dns1006: START - running authdns-update
  • 04:45 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2179 to s4 primary and set section read-write T407177', diff saved to https://phabricator.wikimedia.org/P83953 and previous config saved to /var/cache/conftool/dbconfig/20251016-044557-marostegui.json
  • 04:45 marostegui@cumin1003: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - T407177', diff saved to https://phabricator.wikimedia.org/P83952 and previous config saved to /var/cache/conftool/dbconfig/20251016-044533-marostegui.json
  • 04:45 marostegui: Starting s4 codfw failover from db2240 to db2179 - T407177
  • 04:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: Primary switchover s4 T407177
  • 04:39 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2179 with weight 0 T407177', diff saved to https://phabricator.wikimedia.org/P83951 and previous config saved to /var/cache/conftool/dbconfig/20251016-043920-marostegui.json
  • 04:38 marostegui@cumin1003: dbctl commit (dc=all): 'es1054 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83950 and previous config saved to /var/cache/conftool/dbconfig/20251016-043816-root.json
  • 04:35 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1054 to dbctl depooled T406488', diff saved to https://phabricator.wikimedia.org/P83949 and previous config saved to /var/cache/conftool/dbconfig/20251016-043510-marostegui.json
  • 04:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1260 - Depool db1260.eqiad.wmnet to then clone it to db1262.eqiad.wmnet - marostegui@cumin1003
  • 04:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool db1260 - Depool db1260.eqiad.wmnet to then clone it to db1262.eqiad.wmnet - marostegui@cumin1003
  • 04:30 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1260.eqiad.wmnet onto db1262.eqiad.wmnet
  • 04:16 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2006-dev.codfw.wmnet with OS trixie
  • 03:29 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
  • 03:22 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
  • 03:04 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2006-dev.codfw.wmnet with OS trixie
  • 02:50 eileen: civicrm upgraded from 25df5996 to ac4c185b
  • 00:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 00:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1012.eqiad.wmnet with OS bullseye

2025-10-15

  • 23:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aqs1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 23:36 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 23:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aqs1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 22:56 andrew@cumin2002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM cloudbackup1002-dev.eqiad.wmnet
  • 21:35 andrew@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM cloudbackup1002-dev.eqiad.wmnet
  • 21:29 bvibber@deploy2002: Finished scap sync-world: Backport for This copies .23's revert of the _broken version_ of the CORS image load fix! Production should work fine without it, but the broken version breaks things worse than the original bug. -bv (duration: 07m 13s)
  • 21:25 bvibber@deploy2002: bvibber: Continuing with sync
  • 21:24 bvibber@deploy2002: bvibber: Backport for This copies .23's revert of the _broken version_ of the CORS image load fix! Production should work fine without it, but the broken version breaks things worse than the original bug. -bv synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:22 bvibber@deploy2002: Started scap sync-world: Backport for This copies .23's revert of the _broken version_ of the CORS image load fix! Production should work fine without it, but the broken version breaks things worse than the original bug. -bv
  • 21:05 cjming: end of UTC late backport window
  • 21:03 cjming@deploy2002: Finished scap sync-world: Backport for Enable protection indicator for srwiki (T407183) (duration: 08m 25s)
  • 21:03 andrewbogott: adding additional disk space to cloudbackup1002-dev with "sudo gnt-instance modify --disk add:size=60g cloudbackup1002-dev.eqiad.wmnet"
  • 20:59 cjming@deploy2002: cjming, zoranzoki21: Continuing with sync
  • 20:57 cjming@deploy2002: cjming, zoranzoki21: Backport for Enable protection indicator for srwiki (T407183) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:55 cjming@deploy2002: Started scap sync-world: Backport for Enable protection indicator for srwiki (T407183)
  • 20:51 cjming@deploy2002: Finished scap sync-world: Backport for throttle rule for National Library Board Singapore workshop on 18oct2025 (T407422) (duration: 06m 48s)
  • 20:47 cjming@deploy2002: cjming, robertsky: Continuing with sync
  • 20:47 cjming@deploy2002: cjming, robertsky: Backport for throttle rule for National Library Board Singapore workshop on 18oct2025 (T407422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:44 cjming@deploy2002: Started scap sync-world: Backport for throttle rule for National Library Board Singapore workshop on 18oct2025 (T407422)
  • 20:41 cjming@deploy2002: Finished scap sync-world: Backport for Add reader exp to common settings (T406916) (duration: 13m 51s)
  • 20:36 cjming@deploy2002: ksarabia, cjming: Continuing with sync
  • 20:33 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host aqs1012.eqiad.wmnet
  • 20:29 cjming@deploy2002: ksarabia, cjming: Backport for Add reader exp to common settings (T406916) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:27 cjming@deploy2002: Started scap sync-world: Backport for Add reader exp to common settings (T406916)
  • 20:24 cjming@deploy2002: Finished scap sync-world: Backport for Fix action_context for simple bot detection instrument (T406359) (duration: 07m 12s)
  • 20:20 cjming@deploy2002: cjming: Continuing with sync
  • 20:19 cjming@deploy2002: cjming: Backport for Fix action_context for simple bot detection instrument (T406359) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:17 cjming@deploy2002: Started scap sync-world: Backport for Fix action_context for simple bot detection instrument (T406359)
  • 20:12 kemayo@deploy2002: Finished scap sync-world: Backport for DiscussionTools: enable thanking comments (T366095) (duration: 07m 04s)
  • 20:08 kemayo@deploy2002: kemayo: Continuing with sync
  • 20:07 kemayo@deploy2002: kemayo: Backport for DiscussionTools: enable thanking comments (T366095) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:05 kemayo@deploy2002: Started scap sync-world: Backport for DiscussionTools: enable thanking comments (T366095)
  • 19:51 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2045.codfw.wmnet with OS bullseye
  • 19:42 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: sync
  • 19:42 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: sync
  • 19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:38 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:30 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:30 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:30 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:30 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:28 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:28 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:28 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:28 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:27 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS bullseye
  • 19:21 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:21 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:21 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:20 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:20 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:19 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7007.magru.wmnet with reason: hardware issues, depooled
  • 19:19 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:12 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:12 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:09 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 19:03 sukhe: sudo ipmitool -I lanplus -H "cp7007.mgmt.magru.wmnet" -U root -E chassis power cycle
  • 18:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:55 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:51 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 18:45 eevans@cumin1003: START - Cookbook sre.hosts.dhcp for host aqs1012.eqiad.wmnet
  • 18:30 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_magru
  • 18:30 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7016.magru.wmnet
  • 18:18 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
  • 18:18 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6016.drmrs.wmnet
  • 18:14 swfrench@deploy2002: Finished scap sync-world: Backport for Disable enrollment in PHP 8.3 (T405955) (duration: 10m 21s)
  • 18:14 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
  • 18:14 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6008.drmrs.wmnet
  • 18:10 swfrench@deploy2002: swfrench: Continuing with sync
  • 18:10 sukhe@cumin1003: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on A:cp-text_magru and not P{cp7001*} and A:cp
  • 18:07 swfrench@deploy2002: swfrench: Backport for Disable enrollment in PHP 8.3 (T405955) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:04 swfrench@deploy2002: Started scap sync-world: Backport for Disable enrollment in PHP 8.3 (T405955)
  • 17:47 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7015.magru.wmnet
  • 17:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:37 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6015.drmrs.wmnet
  • 17:34 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6007.drmrs.wmnet
  • 17:26 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host aqs1012.eqiad.wmnet
  • 17:23 swfrench@deploy2002: Finished scap sync-world: Revert to PHP 8.1 - T405955 (duration: 02m 47s)
  • 17:21 swfrench@deploy2002: Started scap sync-world: Revert to PHP 8.1 - T405955
  • 17:06 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:06 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 17:04 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7014.magru.wmnet
  • 16:58 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:55 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6014.drmrs.wmnet
  • 16:53 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6006.drmrs.wmnet
  • 16:53 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:53 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:46 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:40 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2053 slowly with 10 steps - Pooling in new host
  • 16:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:37 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:37 eevans@cumin1003: START - Cookbook sre.hosts.dhcp for host aqs1012.eqiad.wmnet
  • 16:37 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:37 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
  • 16:20 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7013.magru.wmnet
  • 16:19 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7006.magru.wmnet
  • 16:16 eevans@cumin1003: END (FAIL) - Cookbook sre.cassandra.roll-reboot (exit_code=1) rolling reboot on A:aqs-eqiad
  • 16:14 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6013.drmrs.wmnet
  • 16:12 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6005.drmrs.wmnet
  • 15:57 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 15:49 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2206.codfw.wmnet onto db2247.codfw.wmnet
  • 15:49 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2206 gradually with 4 steps - Pool db2206.codfw.wmnet in after cloning
  • 15:37 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7012.magru.wmnet
  • 15:37 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7005.magru.wmnet
  • 15:33 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6012.drmrs.wmnet
  • 15:31 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6004.drmrs.wmnet
  • 15:29 mforns@deploy2002: Finished deploy [analytics/refinery@94efa6e] (thin): Regular analytics weekly train THIN [analytics/refinery@94efa6e8] (duration: 01m 06s)
  • 15:28 mforns@deploy2002: Started deploy [analytics/refinery@94efa6e] (thin): Regular analytics weekly train THIN [analytics/refinery@94efa6e8]
  • 15:28 mforns@deploy2002: Finished deploy [analytics/refinery@94efa6e]: Regular analytics weekly train [analytics/refinery@94efa6e8] (duration: 06m 37s)
  • 15:21 mforns@deploy2002: Started deploy [analytics/refinery@94efa6e]: Regular analytics weekly train [analytics/refinery@94efa6e8]
  • 15:21 mforns@deploy2002: Finished deploy [analytics/refinery@94efa6e] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@94efa6e8] (duration: 02m 17s)
  • 15:19 mforns@deploy2002: Started deploy [analytics/refinery@94efa6e] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@94efa6e8]
  • 15:03 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2206 gradually with 4 steps - Pool db2206.codfw.wmnet in after cloning
  • 14:54 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7004.magru.wmnet
  • 14:54 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7011.magru.wmnet
  • 14:51 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6011.drmrs.wmnet
  • 14:51 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6003.drmrs.wmnet
  • 14:44 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2033 - Depool es2033.codfw.wmnet to then clone it to es2054.codfw.wmnet - fceratto@cumin1003
  • 14:43 fceratto@cumin1003: START - Cookbook sre.mysql.depool es2033 - Depool es2033.codfw.wmnet to then clone it to es2054.codfw.wmnet - fceratto@cumin1003
  • 14:43 fceratto@cumin1003: START - Cookbook sre.mysql.clone_es of es2033.codfw.wmnet onto es2054.codfw.wmnet
  • 14:41 claime: armed keyholder on deploy[1003|2002] following reboots
  • 14:40 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
  • 14:39 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:37 moritzm: armed keyholder on cumin1002 following reboot
  • 14:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2054.codfw.wmnet with reason: Setting up new ES host
  • 14:34 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:aqs-eqiad
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1002.eqiad.wmnet
  • 14:34 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:31 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parsoidtest1001.eqiad.wmnet
  • 14:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1002.eqiad.wmnet
  • 14:29 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
  • 14:26 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host parsoidtest1001.eqiad.wmnet
  • 14:24 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2053 slowly with 10 steps - Pooling in new host
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'es2053 set ipaddr before pool-in', diff saved to https://phabricator.wikimedia.org/P83930 and previous config saved to /var/cache/conftool/dbconfig/20251015-142339-fceratto.json
  • 14:22 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) es2053 slowly with 10 steps - Pooling in new host
  • 14:22 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug2002.codfw.wmnet
  • 14:20 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug2001.codfw.wmnet
  • 14:19 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:19 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1003.eqiad.wmnet
  • 14:18 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 14:17 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 14:16 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwdebug2002.codfw.wmnet
  • 14:15 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1002.eqiad.wmnet
  • 14:14 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwdebug2001.codfw.wmnet
  • 14:14 fceratto@cumin1003: START - Cookbook sre.mysql.pool es2053 slowly with 10 steps - Pooling in new host
  • 14:14 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwdebug1002.eqiad.wmnet
  • 14:12 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7003.magru.wmnet
  • 14:11 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1001.eqiad.wmnet
  • 14:11 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7010.magru.wmnet
  • 14:11 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2053.codfw.wmnet
  • 14:11 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2053.codfw.wmnet
  • 14:11 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:11 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2053.codfw.wmnet
  • 14:11 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es2053.codfw.wmnet
  • 14:11 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6010.drmrs.wmnet
  • 14:10 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6002.drmrs.wmnet
  • 14:10 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host deploy1003.eqiad.wmnet
  • 14:09 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:07 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwdebug1001.eqiad.wmnet
  • 14:05 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 14:04 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:03 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:03 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Add es2053 T402859', diff saved to https://phabricator.wikimedia.org/P83929 and previous config saved to /var/cache/conftool/dbconfig/20251015-135630-fceratto.json
  • 13:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:33 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:33 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:31 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:31 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:29 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6009.drmrs.wmnet
  • 13:29 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp6001.drmrs.wmnet
  • 13:29 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7002.magru.wmnet
  • 13:28 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7009.magru.wmnet
  • 13:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1260.eqiad.wmnet onto db1261.eqiad.wmnet
  • 13:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
  • 13:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 13:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:18 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
  • 13:18 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
  • 13:17 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_magru
  • 13:16 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_magru and not P{cp7001*} and A:cp
  • 13:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 13:16 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet [reason: already rebooted; pooling]
  • 13:15 sukhe@cumin1003: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-text_magru
  • 13:15 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_magru
  • 13:14 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:14 tchin@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:00 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1026 T407351', diff saved to https://phabricator.wikimedia.org/P83925 and previous config saved to /var/cache/conftool/dbconfig/20251015-124927-marostegui.json
  • 12:44 claime: enabling puppet on cp nodes for T406318
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parsoidtest1001.eqiad.wmnet
  • 12:29 claime: disabling puppet on cp nodes for T406318
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host parsoidtest1001.eqiad.wmnet
  • 12:26 moritzm: installing ghostscript security updates
  • 12:25 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2009.codfw.wmnet
  • 12:18 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2009.codfw.wmnet
  • 12:18 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2008.codfw.wmnet
  • 12:12 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2008.codfw.wmnet
  • 12:12 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2007.codfw.wmnet
  • 12:05 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2007.codfw.wmnet
  • 12:05 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2006.codfw.wmnet
  • 12:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2206 - Depool db2206.codfw.wmnet to then clone it to db2247.codfw.wmnet - marostegui@cumin1003
  • 12:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool db2206 - Depool db2206.codfw.wmnet to then clone it to db2247.codfw.wmnet - marostegui@cumin1003
  • 12:01 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2206.codfw.wmnet onto db2247.codfw.wmnet
  • 11:57 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2006.codfw.wmnet
  • 11:57 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2005.codfw.wmnet
  • 11:50 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 11:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 11:19 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:18 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:17 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:16 claime: Enabling puppet on all cp nodes for 1195679: trafficserver: remove gateway-check group-specific routes for rest.php - T406318
  • 11:16 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:15 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:14 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:12 claime: Enabling puppet on cp6015 for 1195679: trafficserver: remove gateway-check group-specific routes for rest.php - T406318
  • 11:07 claime: disabling puppet on cp nodes for 1195679: trafficserver: remove gateway-check group-specific routes for rest.php - T406318
  • 10:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:44 hashar@deploy2002: Finished scap sync-world: Backport for Replace call to deprecated method getImages (T407184) (duration: 32m 19s)
  • 10:40 hashar@deploy2002: hashar: Continuing with sync
  • 10:37 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1009.eqiad.wmnet
  • 10:30 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1009.eqiad.wmnet
  • 10:30 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1008.eqiad.wmnet
  • 10:23 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1008.eqiad.wmnet
  • 10:23 moritzm: installing libcommons-lang3-java security updates
  • 10:23 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1007.eqiad.wmnet
  • 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2044.codfw.wmnet with OS trixie
  • 10:18 hnowlan: deleted legacy EMEA/Americas business hours Splunk rotations
  • 10:16 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1007.eqiad.wmnet
  • 10:16 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1006.eqiad.wmnet
  • 10:16 hashar@deploy2002: hashar: Backport for Replace call to deprecated method getImages (T407184) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:14 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:14 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:11 hashar@deploy2002: Started scap sync-world: Backport for Replace call to deprecated method getImages (T407184)
  • 10:09 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1006.eqiad.wmnet
  • 10:09 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1005.eqiad.wmnet
  • 10:03 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
  • 10:02 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1005.eqiad.wmnet
  • 09:58 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
  • 09:44 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2044.codfw.wmnet with OS trixie
  • 09:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS trixie
  • 09:37 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS trixie
  • 09:33 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2206.codfw.wmnet onto db2248.codfw.wmnet
  • 09:32 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2206 gradually with 4 steps - Pool db2206.codfw.wmnet in after cloning
  • 09:32 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2043.codfw.wmnet']
  • 09:32 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2043.codfw.wmnet']
  • 09:31 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2045.codfw.wmnet']
  • 09:31 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2045.codfw.wmnet']
  • 09:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1260 gradually with 4 steps - Pool db1260.eqiad.wmnet in after cloning
  • 09:18 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 09:17 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 09:17 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:16 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 09:14 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:13 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:01 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 09:01 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 08:59 Amir1: mwscript-k8s -- purgeUserOptions.php --wiki=loginwiki (T406724)
  • 08:57 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2047.codfw.wmnet']
  • 08:51 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2047.codfw.wmnet']
  • 08:49 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 08:47 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2206 gradually with 4 steps - Pool db2206.codfw.wmnet in after cloning
  • 08:44 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
  • 08:41 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 08:33 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83918 and previous config saved to /var/cache/conftool/dbconfig/20251015-083339-root.json
  • 08:33 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83917 and previous config saved to /var/cache/conftool/dbconfig/20251015-083333-root.json
  • 08:30 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 08:29 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 08:22 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 08:22 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 08:19 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.23 refs T405679
  • 08:18 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83916 and previous config saved to /var/cache/conftool/dbconfig/20251015-081833-root.json
  • 08:18 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83915 and previous config saved to /var/cache/conftool/dbconfig/20251015-081827-root.json
  • 08:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 08:14 elukey@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 08:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 08:13 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 08:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2032.codfw.wmnet onto es2053.codfw.wmnet
  • 08:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2032 gradually with 4 steps - Pool es2032.codfw.wmnet in after cloning
  • 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
  • 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
  • 08:04 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 08:04 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 08:04 slyngshede@dns1004: END - running authdns-update
  • 08:03 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83913 and previous config saved to /var/cache/conftool/dbconfig/20251015-080327-root.json
  • 08:03 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83912 and previous config saved to /var/cache/conftool/dbconfig/20251015-080321-root.json
  • 08:03 slyngshede@dns1004: START - running authdns-update
  • 08:02 slyngs: Moving CAS/IDP/SSO to Trixie.
  • 07:58 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 07:57 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 07:53 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
  • 07:50 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 07:50 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2055.codfw.wmnet']
  • 07:50 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2055.codfw.wmnet']
  • 07:48 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83910 and previous config saved to /var/cache/conftool/dbconfig/20251015-074821-root.json
  • 07:48 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83909 and previous config saved to /var/cache/conftool/dbconfig/20251015-074815-root.json
  • 07:33 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83907 and previous config saved to /var/cache/conftool/dbconfig/20251015-073316-root.json
  • 07:33 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83906 and previous config saved to /var/cache/conftool/dbconfig/20251015-073309-root.json
  • 07:28 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2032 gradually with 4 steps - Pool es2032.codfw.wmnet in after cloning
  • 07:27 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Enable on enwiki (T402366) (duration: 09m 02s)
  • 07:23 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:21 kharlan@deploy2002: kharlan: Backport for hCaptcha: Enable on enwiki (T402366) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
  • 07:18 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Enable on enwiki (T402366)
  • 07:18 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83904 and previous config saved to /var/cache/conftool/dbconfig/20251015-071810-root.json
  • 07:18 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83903 and previous config saved to /var/cache/conftool/dbconfig/20251015-071803-root.json
  • 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
  • 07:03 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83901 and previous config saved to /var/cache/conftool/dbconfig/20251015-070304-root.json
  • 07:03 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83900 and previous config saved to /var/cache/conftool/dbconfig/20251015-070258-root.json
  • 06:48 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83899 and previous config saved to /var/cache/conftool/dbconfig/20251015-064758-root.json
  • 06:47 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83898 and previous config saved to /var/cache/conftool/dbconfig/20251015-064752-root.json
  • 06:46 jmm@dns1004: END - running authdns-update
  • 06:45 jmm@dns1004: START - running authdns-update
  • 06:32 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83897 and previous config saved to /var/cache/conftool/dbconfig/20251015-063252-root.json
  • 06:32 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83896 and previous config saved to /var/cache/conftool/dbconfig/20251015-063246-root.json
  • 06:17 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83895 and previous config saved to /var/cache/conftool/dbconfig/20251015-061746-root.json
  • 06:17 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83894 and previous config saved to /var/cache/conftool/dbconfig/20251015-061740-root.json
  • 06:13 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1032.eqiad.wmnet onto es1055.eqiad.wmnet
  • 06:13 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1032 gradually with 4 steps - Pool es1032.eqiad.wmnet in after cloning
  • 06:13 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1031.eqiad.wmnet onto es1054.eqiad.wmnet
  • 06:13 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1031 gradually with 4 steps - Pool es1031.eqiad.wmnet in after cloning
  • 06:02 marostegui@cumin1003: dbctl commit (dc=all): 'es1057 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83891 and previous config saved to /var/cache/conftool/dbconfig/20251015-060240-root.json
  • 06:02 marostegui@cumin1003: dbctl commit (dc=all): 'es1052 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83890 and previous config saved to /var/cache/conftool/dbconfig/20251015-060234-root.json
  • 06:02 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1052 and es1057 to dbctl depooled T406488', diff saved to https://phabricator.wikimedia.org/P83889 and previous config saved to /var/cache/conftool/dbconfig/20251015-060210-marostegui.json
  • 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1260.eqiad.wmnet onto db1261.eqiad.wmnet
  • 05:54 marostegui@cumin1003: dbctl commit (dc=all): 'Add db1260 to dbctl depooled T406550', diff saved to https://phabricator.wikimedia.org/P83886 and previous config saved to /var/cache/conftool/dbconfig/20251015-055457-marostegui.json
  • 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2206 - Depool db2206.codfw.wmnet to then clone it to db2248.codfw.wmnet - marostegui@cumin1003
  • 05:43 marostegui@cumin1003: START - Cookbook sre.mysql.depool db2206 - Depool db2206.codfw.wmnet to then clone it to db2248.codfw.wmnet - marostegui@cumin1003
  • 05:43 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db2206.codfw.wmnet onto db2248.codfw.wmnet
  • 05:27 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1032 gradually with 4 steps - Pool es1032.eqiad.wmnet in after cloning
  • 05:27 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1031 gradually with 4 steps - Pool es1031.eqiad.wmnet in after cloning
  • 04:56 eileen: civicrm upgraded from 4d3107fc to 25df5996
  • 01:40 musikanimal@deploy2002: Finished scap sync-world: Backport for Make tags be links to wish-index with filter applied (T406719) (duration: 07m 25s)
  • 01:36 musikanimal@deploy2002: hmonroy, musikanimal: Continuing with sync
  • 01:35 musikanimal@deploy2002: hmonroy, musikanimal: Backport for Make tags be links to wish-index with filter applied (T406719) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:33 musikanimal@deploy2002: Started scap sync-world: Backport for Make tags be links to wish-index with filter applied (T406719)
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 06s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-14

  • 23:39 musikanimal@deploy2002: Finished scap sync-world: Backport for wish-index: pass in wishesData so that initial filters are set (T400945) (duration: 07m 08s)
  • 23:35 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 23:34 musikanimal@deploy2002: musikanimal: Backport for wish-index: pass in wishesData so that initial filters are set (T400945) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:32 musikanimal@deploy2002: Started scap sync-world: Backport for wish-index: pass in wishesData so that initial filters are set (T400945)
  • 21:55 greg-g: (from eileen) civicrm upgraded from f68c287a to 4d3107fc
  • 21:43 ladsgroup@deploy2002: Finished scap sync-world: Backport for Set reader experiment to true (T406916) (duration: 11m 26s)
  • 21:38 ladsgroup@deploy2002: ksarabia, ladsgroup: Continuing with sync
  • 21:34 ladsgroup@deploy2002: ksarabia, ladsgroup: Backport for Set reader experiment to true (T406916) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:32 ladsgroup@deploy2002: Started scap sync-world: Backport for Set reader experiment to true (T406916)
  • 21:31 ladsgroup@deploy2002: Finished scap sync-world: Backport for ImageBrowsing: fix UI bugs in Overlay, DetailView and VTOC (T405992) (duration: 14m 22s)
  • 21:25 ladsgroup@deploy2002: ksarabia, ladsgroup: Continuing with sync
  • 21:19 ladsgroup@deploy2002: ksarabia, ladsgroup: Backport for ImageBrowsing: fix UI bugs in Overlay, DetailView and VTOC (T405992) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:17 ladsgroup@deploy2002: Started scap sync-world: Backport for ImageBrowsing: fix UI bugs in Overlay, DetailView and VTOC (T405992)
  • 21:16 ladsgroup@deploy2002: Finished scap sync-world: Backport for Revert "Add icons for wikibase changes. WIP" (duration: 16m 34s)
  • 21:10 ladsgroup@deploy2002: neslihanturan, ladsgroup: Continuing with sync
  • 21:04 ladsgroup@deploy2002: neslihanturan, ladsgroup: Backport for Revert "Add icons for wikibase changes. WIP" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:59 ladsgroup@deploy2002: Started scap sync-world: Backport for Revert "Add icons for wikibase changes. WIP"
  • 20:37 toyofuku@deploy2002: Finished scap sync-world: Backport for Add ReadingList Stream to EventStreamConfig (T406627) (duration: 11m 58s)
  • 20:30 toyofuku@deploy2002: lmora, toyofuku: Continuing with sync
  • 20:29 toyofuku@deploy2002: lmora, toyofuku: Backport for Add ReadingList Stream to EventStreamConfig (T406627) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:25 toyofuku@deploy2002: Started scap sync-world: Backport for Add ReadingList Stream to EventStreamConfig (T406627)
  • 20:21 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5002.eqsin.wmnet with OS trixie
  • 20:17 kemayo@deploy2002: Finished scap sync-world: Backport for Suggestions mode (T399612) (duration: 12m 47s)
  • 20:09 kemayo@deploy2002: kemayo: Continuing with sync
  • 20:09 kemayo@deploy2002: kemayo: Backport for Suggestions mode (T399612) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:05 kemayo@deploy2002: Started scap sync-world: Backport for Suggestions mode (T399612)
  • 19:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
  • 19:56 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 19:56 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 19:56 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 19:56 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 19:55 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
  • 19:55 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 19:55 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 19:54 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 19:54 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 19:53 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 19:53 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 19:51 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 19:51 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 19:51 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 19:50 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 19:50 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 19:50 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 19:49 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 19:49 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 19:41 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 19:40 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 19:40 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 19:39 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 19:39 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 19:38 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 19:38 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
  • 19:38 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
  • 19:36 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 19:36 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 19:35 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 19:35 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 19:34 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 19:34 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 19:32 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:30 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:29 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 19:29 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 19:29 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 19:28 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 19:15 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 19:09 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 19:08 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 19:08 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 19:06 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 19:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 19:05 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 19:05 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 19:04 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 19:04 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 19:03 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 19:03 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 19:03 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 19:02 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 19:01 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 19:01 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 19:00 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:00 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 18:59 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum5002.eqsin.wmnet with OS trixie
  • 18:59 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:59 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:57 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:57 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 18:55 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 18:55 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 18:55 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 18:55 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 18:55 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 18:54 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 18:53 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 18:53 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 18:52 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 18:52 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 18:52 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 18:51 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 18:49 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
  • 18:49 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
  • 18:48 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 18:48 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 18:46 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:46 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:46 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4002.ulsfo.wmnet with OS trixie
  • 18:44 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum2002.codfw.wmnet with OS trixie
  • 18:44 rzl: rzl@deploy1003:~$ kube-env mw-script-deploy codfw; helm uninstall amfcta11 # HelmReleaseBadStatus alert was firing for this mw-script job in state pending-install, even though the job was long since finished
  • 18:38 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1001.eqiad.wmnet with OS trixie
  • 18:36 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:35 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 18:34 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 18:34 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:34 ejegg: fundraising civicrm upgraded from 9393addf to f68c287a
  • 18:32 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 18:32 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
  • 18:32 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 18:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 18:31 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:31 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 18:31 brett@dns1004: END - running authdns-update
  • 18:29 brett@dns1004: START - running authdns-update
  • 18:29 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 18:28 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
  • 18:28 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 18:26 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 18:23 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 18:23 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 18:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 18:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 18:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 18:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 18:19 brett@dns1004: END - running authdns-update
  • 18:18 brett@dns1004: START - running authdns-update
  • 18:17 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 18:11 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 18:11 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 18:11 swfrench@deploy2002: Finished scap sync-world: Backport for Enroll 1% of client sessions in PHP 8.3 (T405955) (duration: 19m 18s)
  • 18:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum4002.ulsfo.wmnet with OS trixie
  • 18:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum2002.codfw.wmnet with OS trixie
  • 18:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum1001.eqiad.wmnet with OS trixie
  • 18:01 swfrench@deploy2002: swfrench: Continuing with sync
  • 17:56 swfrench@deploy2002: swfrench: Backport for Enroll 1% of client sessions in PHP 8.3 (T405955) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:52 swfrench@deploy2002: Started scap sync-world: Backport for Enroll 1% of client sessions in PHP 8.3 (T405955)
  • 17:48 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 17:48 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 17:41 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:40 tchin@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:19 swfrench@deploy2002: Finished scap sync-world: Non-image-build scap run to scale 8.3 deployments - T405955 (duration: 05m 41s)
  • 17:15 swfrench@deploy2002: Started scap sync-world: Non-image-build scap run to scale 8.3 deployments - T405955
  • 16:55 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 16:55 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 16:43 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 16:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 16:36 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:36 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:32 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:32 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:28 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 16:27 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 16:27 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-codfw
  • 16:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 16:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 16:19 mutante: rebooting backend of releases.wikimedia.org
  • 16:19 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases1003.eqiad.wmnet with reason: reboot
  • 16:18 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test1003.eqiad.wmnet
  • 16:18 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test1003.eqiad.wmnet with OS trixie
  • 16:17 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:16 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:12 mutante: rebooting phab2002
  • 16:11 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab2002.codfw.wmnet with reason: reboot
  • 16:04 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test1003.eqiad.wmnet with reason: host reimage
  • 16:03 mutante: CI should be back in operation as normal
  • 15:57 mutante: rebooting main CI server - integration.wikimedia.org will be down for a minute
  • 15:57 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test1003.eqiad.wmnet with reason: host reimage
  • 15:56 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on contint1002.wikimedia.org with reason: reboot
  • 15:50 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on contint2002.wikimedia.org with reason: reboot
  • 15:50 mutante: contint2002 - rebooting - (not the manager host)
  • 15:47 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test1003.eqiad.wmnet with OS trixie
  • 15:46 swfrench-wmf: rolling run-puppet-agent on A:cp hosts - T405955
  • 15:33 swfrench-wmf: disable-puppet on A:cp hosts - T405955
  • 15:30 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1003.eqiad.wmnet - fceratto@cumin1002"
  • 15:30 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1003.eqiad.wmnet - fceratto@cumin1002"
  • 15:30 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test1003.eqiad.wmnet on all recursors
  • 15:30 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test1003.eqiad.wmnet on all recursors
  • 15:30 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1003.eqiad.wmnet - fceratto@cumin1002"
  • 15:21 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1003.eqiad.wmnet - fceratto@cumin1002"
  • 15:20 moritzm: installing jq security updates
  • 15:17 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-eqiad
  • 15:05 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 15:05 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test1003.eqiad.wmnet
  • 15:04 brennen@deploy2002: Finished deploy [phabricator/deployment@16c9739]: deploy phab1004 for T407244 (duration: 00m 58s)
  • 15:03 brennen@deploy2002: Started deploy [phabricator/deployment@16c9739]: deploy phab1004 for T407244
  • 15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@16c9739]: deploy phab2002 for T407244 (duration: 00m 31s)
  • 15:02 brennen@deploy2002: Started deploy [phabricator/deployment@16c9739]: deploy phab2002 for T407244
  • 14:58 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on phab2002.codfw.wmnet,phab[1004-1005].eqiad.wmnet with reason: T407244
  • 14:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 14:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 14:36 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 14:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 14:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 14:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 14:32 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test1001.eqiad.wmnet
  • 14:32 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test1001.eqiad.wmnet with OS trixie
  • 14:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:30 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7001*} or P{cp4037*} and A:cp
  • 14:30 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp4037.ulsfo.wmnet
  • 14:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
  • 14:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
  • 14:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 14:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 14:26 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:26 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-eqiad
  • 14:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 14:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 14:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 14:18 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test1001.eqiad.wmnet with reason: host reimage
  • 14:18 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-codfw
  • 14:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:17 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2055.codfw.wmnet']
  • 14:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:14 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:12 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test1001.eqiad.wmnet with reason: host reimage
  • 14:11 samtar@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Send source/instrument (T401575) (duration: 09m 25s)
  • 14:09 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2055.codfw.wmnet']
  • 14:09 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 14:09 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 14:07 samtar@deploy2002: samtar: Continuing with sync
  • 14:06 samtar@deploy2002: samtar: Backport for ext.wikimediaEvents.WatchlistBaseline: Send source/instrument (T401575) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:02 samtar@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Send source/instrument (T401575)
  • 14:02 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test1001.eqiad.wmnet with OS trixie
  • 14:01 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1001.eqiad.wmnet - fceratto@cumin1002"
  • 14:01 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1001.eqiad.wmnet - fceratto@cumin1002"
  • 14:00 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test1001.eqiad.wmnet on all recursors
  • 14:00 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test1001.eqiad.wmnet on all recursors
  • 14:00 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:00 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1001.eqiad.wmnet - fceratto@cumin1002"
  • 14:00 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1001.eqiad.wmnet - fceratto@cumin1002"
  • 14:00 phuedx@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents: simple-bot-detection: Use correct schema, ext.wikimediaEvents: simple-bot-detection: Use correct schema (duration: 10m 17s)
  • 13:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 13:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 13:56 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 13:56 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test1001.eqiad.wmnet
  • 13:56 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:56 phuedx@deploy2002: phuedx: Continuing with sync
  • 13:55 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:54 phuedx@deploy2002: phuedx: Backport for ext.wikimediaEvents: simple-bot-detection: Use correct schema, ext.wikimediaEvents: simple-bot-detection: Use correct schema synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:53 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:53 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:52 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:49 phuedx@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents: simple-bot-detection: Use correct schema, ext.wikimediaEvents: simple-bot-detection: Use correct schema
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
  • 13:46 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:46 sukhe@cumin1003: cookbooks.sre.cdn.roll-reboot finished rebooting cp7001.magru.wmnet
  • 13:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
  • 13:42 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
  • 13:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 13:39 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 13:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 13:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
  • 13:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:34 sukhe@cumin1003: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7001*} or P{cp4037*} and A:cp
  • 13:31 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-codfw
  • 13:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:26 logmsgbot: daniel Deployed security patch for T405859
  • 13:19 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-codfw
  • 13:16 logmsgbot: daniel Deployed security patch for T405859
  • 13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 13:07 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 13:06 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:05 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2084.codfw.wmnet with OS bullseye
  • 13:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-wikidata: apply
  • 13:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-wikidata: apply
  • 13:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1095.eqiad.wmnet
  • 12:53 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1095.eqiad.wmnet
  • 12:51 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1094.eqiad.wmnet with OS bullseye
  • 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage
  • 12:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:46 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:46 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage
  • 12:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:39 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2032 - Depool es2032.codfw.wmnet to then clone it to es2053.codfw.wmnet - fceratto@cumin1002
  • 12:39 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2032 - Depool es2032.codfw.wmnet to then clone it to es2053.codfw.wmnet - fceratto@cumin1002
  • 12:39 fceratto@cumin1002: START - Cookbook sre.mysql.clone_es of es2032.codfw.wmnet onto es2053.codfw.wmnet
  • 12:34 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye
  • 12:33 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1094.eqiad.wmnet with reason: host reimage
  • 12:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1247.eqiad.wmnet onto db1260.eqiad.wmnet
  • 12:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
  • 12:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1247 gradually with 4 steps - Pool db1247.eqiad.wmnet in after cloning
  • 12:30 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1094.eqiad.wmnet with reason: host reimage
  • 12:18 dbrant@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:17 dbrant@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:17 dbrant@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:16 dbrant@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:15 dbrant@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:15 dbrant@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
  • 12:13 dbrant@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:12 dbrant@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:12 dbrant@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:08 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1094.eqiad.wmnet with OS bullseye
  • 12:08 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
  • 12:07 ladsgroup@deploy2002: Finished scap sync-world: Backport for filebackend: Remove consistency check for multi-backend (T328872) (duration: 12m 46s)
  • 12:07 dbrant@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:07 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be1094.eqiad.wmnet
  • 12:03 dbrant@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:03 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:03 dbrant@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:59 ladsgroup@deploy2002: ladsgroup: Backport for filebackend: Remove consistency check for multi-backend (T328872) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:54 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
  • 11:54 ladsgroup@deploy2002: Started scap sync-world: Backport for filebackend: Remove consistency check for multi-backend (T328872)
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
  • 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
  • 11:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1094.eqiad.wmnet
  • 11:46 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1093.eqiad.wmnet
  • 11:45 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1247 gradually with 4 steps - Pool db1247.eqiad.wmnet in after cloning
  • 11:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:39 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1093.eqiad.wmnet
  • 11:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1092.eqiad.wmnet
  • 11:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:32 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1092.eqiad.wmnet
  • 11:32 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1091.eqiad.wmnet
  • 11:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2089.codfw.wmnet
  • 11:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:26 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1091.eqiad.wmnet
  • 11:26 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1090.eqiad.wmnet
  • 11:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2089.codfw.wmnet
  • 11:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2088.codfw.wmnet
  • 11:19 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1090.eqiad.wmnet
  • 11:18 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1089.eqiad.wmnet
  • 11:16 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2088.codfw.wmnet
  • 11:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2087.codfw.wmnet
  • 10:58 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1087.eqiad.wmnet
  • 10:58 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1086.eqiad.wmnet
  • 10:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2085.codfw.wmnet
  • 10:55 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test1002.eqiad.wmnet
  • 10:55 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test1002.eqiad.wmnet with OS trixie
  • 10:51 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1086.eqiad.wmnet
  • 10:51 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1085.eqiad.wmnet
  • 10:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2082.codfw.wmnet
  • 10:49 hashar: Restarted Zuul to have it reconnect to Gerrit
  • 10:48 fabfur: enable puppet on all DNS hosts for manual gerrit switch (T407200)
  • 10:44 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1085.eqiad.wmnet
  • 10:44 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1084.eqiad.wmnet
  • 10:43 arnaudb@dns1004: END - running authdns-update
  • 10:43 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test1002.eqiad.wmnet with reason: host reimage
  • 10:43 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2082.codfw.wmnet
  • 10:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2081.codfw.wmnet
  • 10:42 arnaudb@dns1004: START - running authdns-update
  • 10:38 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1084.eqiad.wmnet
  • 10:38 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1083.eqiad.wmnet
  • 10:37 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test1002.eqiad.wmnet with reason: host reimage
  • 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2081.codfw.wmnet
  • 10:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2080.codfw.wmnet
  • 10:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2053.codfw.wmnet with reason: Setting up new ES host
  • 10:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1083.eqiad.wmnet
  • 10:31 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1082.eqiad.wmnet
  • 10:28 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2080.codfw.wmnet
  • 10:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2079.codfw.wmnet
  • 10:27 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test1002.eqiad.wmnet with OS trixie
  • 10:23 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1082.eqiad.wmnet
  • 10:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2079.codfw.wmnet
  • 10:20 fabfur: disabling puppet on all DNS hosts for manual gerrit switch (T407200)
  • 10:18 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1002.eqiad.wmnet - fceratto@cumin1002"
  • 10:18 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test1002.eqiad.wmnet - fceratto@cumin1002"
  • 10:17 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test1002.eqiad.wmnet on all recursors
  • 10:17 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test1002.eqiad.wmnet on all recursors
  • 10:17 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1002.eqiad.wmnet - fceratto@cumin1002"
  • 10:16 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be1081.eqiad.wmnet
  • 10:15 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test1002.eqiad.wmnet - fceratto@cumin1002"
  • 10:09 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 10:09 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test1002.eqiad.wmnet
  • 10:04 Amir1: mwscript-k8s --follow --dblist=group0 -- purgeUserOptions.php (T406724)
  • 09:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2077.codfw.wmnet
  • 09:52 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test2002.codfw.wmnet
  • 09:52 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test2002.codfw.wmnet with OS trixie
  • 09:50 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2077.codfw.wmnet
  • 09:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2076.codfw.wmnet
  • 09:41 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2076.codfw.wmnet
  • 09:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2075.codfw.wmnet
  • 09:38 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test2002.codfw.wmnet with reason: host reimage
  • 09:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
  • 09:34 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2075.codfw.wmnet
  • 09:34 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2074.codfw.wmnet
  • 09:33 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test2002.codfw.wmnet with reason: host reimage
  • 09:26 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2074.codfw.wmnet
  • 09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
  • 09:25 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.failover (exit_code=99) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 09:22 arnaudb@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gerrit.wikimedia.org gerrit-replica.wikimedia.org on all recursors
  • 09:22 arnaudb@cumin1003: START - Cookbook sre.dns.wipe-cache gerrit.wikimedia.org gerrit-replica.wikimedia.org on all recursors
  • 09:22 arnaudb@dns1004: END - running authdns-update
  • 09:19 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test2002.codfw.wmnet with OS trixie
  • 09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
  • 09:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
  • 09:18 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test2002.codfw.wmnet - fceratto@cumin1002"
  • 09:18 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test2002.codfw.wmnet - fceratto@cumin1002"
  • 09:17 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test2002.codfw.wmnet on all recursors
  • 09:17 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test2002.codfw.wmnet on all recursors
  • 09:17 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:17 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test2002.codfw.wmnet - fceratto@cumin1002"
  • 09:17 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test2002.codfw.wmnet - fceratto@cumin1002"
  • 09:13 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 09:12 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test2002.codfw.wmnet
  • 09:11 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
  • 09:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
  • 09:10 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1081.eqiad.wmnet
  • 09:10 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1080.eqiad.wmnet
  • 09:05 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1080.eqiad.wmnet
  • 09:05 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1079.eqiad.wmnet
  • 09:04 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
  • 09:04 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
  • 09:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
  • 09:04 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
  • 09:02 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
  • 09:02 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
  • 09:02 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 09:02 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 09:02 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 09:00 arnaudb@dns1004: START - running authdns-update
  • 08:57 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1079.eqiad.wmnet
  • 08:57 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1078.eqiad.wmnet
  • 08:56 topranks: enable new inter.link IP transit circuit on cr1-drms T401104
  • 08:56 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
  • 08:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
  • 08:50 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1078.eqiad.wmnet
  • 08:50 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1077.eqiad.wmnet
  • 08:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
  • 08:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
  • 08:45 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:44 brouberol@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:44 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:42 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:41 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1077.eqiad.wmnet
  • 08:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
  • 08:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1076.eqiad.wmnet
  • 08:38 brouberol@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:37 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.23 refs T405679
  • 08:37 brouberol@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:33 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:33 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 08:32 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
  • 08:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1076.eqiad.wmnet
  • 08:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
  • 08:31 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
  • 08:30 brouberol@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 08:29 brouberol@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1247 - Depool db1247.eqiad.wmnet to then clone it to db1260.eqiad.wmnet - marostegui@cumin1003
  • 08:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool db1247 - Depool db1247.eqiad.wmnet to then clone it to db1260.eqiad.wmnet - marostegui@cumin1003
  • 08:25 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1247.eqiad.wmnet onto db1260.eqiad.wmnet
  • 08:23 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
  • 08:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 08:20 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
  • 08:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 08:18 dcausse: closing the UTC morning backport window
  • 08:14 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: test completion with default sort on simplewiki [3/3] (T404858), ext-EventLogging: Allowlist product_metrics.web_base_with_ip stream (T406332) (duration: 10m 46s)
  • 08:12 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
  • 08:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
  • 08:10 dcausse@deploy2002: dcausse, phuedx: Continuing with sync
  • 08:07 dcausse@deploy2002: dcausse, phuedx: Backport for cirrus: test completion with default sort on simplewiki [3/3] (T404858), ext-EventLogging: Allowlist product_metrics.web_base_with_ip stream (T406332) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:03 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: test completion with default sort on simplewiki [3/3] (T404858), ext-EventLogging: Allowlist product_metrics.web_base_with_ip stream (T406332)
  • 08:02 dcausse@deploy2002: mwscript-k8s job started: namespaceDupes eswiktionary --fix # T407150
  • 08:01 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 08:01 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 08:00 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83863 and previous config saved to /var/cache/conftool/dbconfig/20251014-080025-root.json
  • 08:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:59 dcausse@deploy2002: Finished scap sync-world: Backport for [enwikibooks] Set $wgAutoConfirmCount to 5 (T407080), [eswiktionary] Create a Tesauro namespace (T407150), [kawiki] Enable NewUserMessage extension (T407076) (duration: 11m 29s)
  • 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83862 and previous config saved to /var/cache/conftool/dbconfig/20251014-075608-root.json
  • 07:54 dcausse@deploy2002: dcausse, superpes: Continuing with sync
  • 07:51 dcausse@deploy2002: dcausse, superpes: Backport for [enwikibooks] Set $wgAutoConfirmCount to 5 (T407080), [eswiktionary] Create a Tesauro namespace (T407150), [kawiki] Enable NewUserMessage extension (T407076) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:47 dcausse@deploy2002: Started scap sync-world: Backport for [enwikibooks] Set $wgAutoConfirmCount to 5 (T407080), [eswiktionary] Create a Tesauro namespace (T407150), [kawiki] Enable NewUserMessage extension (T407076)
  • 07:45 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83861 and previous config saved to /var/cache/conftool/dbconfig/20251014-074519-root.json
  • 07:43 dcausse@deploy2002: Finished scap sync-world: Backport for Implement new usage types for statement with qualifiers and references (T401290) (duration: 10m 50s)
  • 07:41 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83860 and previous config saved to /var/cache/conftool/dbconfig/20251014-074102-root.json
  • 07:39 dcausse@deploy2002: joelyrookewmde, dcausse: Continuing with sync
  • 07:36 dcausse@deploy2002: joelyrookewmde, dcausse: Backport for Implement new usage types for statement with qualifiers and references (T401290) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:32 dcausse@deploy2002: Started scap sync-world: Backport for Implement new usage types for statement with qualifiers and references (T401290)
  • 07:30 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83859 and previous config saved to /var/cache/conftool/dbconfig/20251014-073013-root.json
  • 07:28 dcausse@deploy2002: Finished scap sync-world: Backport for Remove artifact from Quechua Wikipedia wordmark (duration: 11m 46s)
  • 07:25 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83858 and previous config saved to /var/cache/conftool/dbconfig/20251014-072556-root.json
  • 07:22 dcausse@deploy2002: jhsoby, dcausse: Continuing with sync
  • 07:21 dcausse@deploy2002: jhsoby, dcausse: Backport for Remove artifact from Quechua Wikipedia wordmark synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:16 dcausse@deploy2002: Started scap sync-world: Backport for Remove artifact from Quechua Wikipedia wordmark
  • 07:15 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83857 and previous config saved to /var/cache/conftool/dbconfig/20251014-071507-root.json
  • 07:10 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83856 and previous config saved to /var/cache/conftool/dbconfig/20251014-071050-root.json
  • 07:00 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83855 and previous config saved to /var/cache/conftool/dbconfig/20251014-070001-root.json
  • 06:55 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83854 and previous config saved to /var/cache/conftool/dbconfig/20251014-065544-root.json
  • 06:44 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83853 and previous config saved to /var/cache/conftool/dbconfig/20251014-064455-root.json
  • 06:40 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83852 and previous config saved to /var/cache/conftool/dbconfig/20251014-064038-root.json
  • 06:37 marostegui@cumin1003: dbctl commit (dc=all): 'db1244 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83851 and previous config saved to /var/cache/conftool/dbconfig/20251014-063724-root.json
  • 06:29 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83850 and previous config saved to /var/cache/conftool/dbconfig/20251014-062949-root.json
  • 06:25 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83848 and previous config saved to /var/cache/conftool/dbconfig/20251014-062532-root.json
  • 06:22 marostegui@cumin1003: dbctl commit (dc=all): 'db1244 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83847 and previous config saved to /var/cache/conftool/dbconfig/20251014-062218-root.json
  • 06:21 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
  • 06:14 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83846 and previous config saved to /var/cache/conftool/dbconfig/20251014-061444-root.json
  • 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es1032 - Depool es1032.eqiad.wmnet to then clone it to es1055.eqiad.wmnet - marostegui@cumin1003
  • 06:14 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
  • 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83845 and previous config saved to /var/cache/conftool/dbconfig/20251014-061026-root.json
  • 06:07 marostegui@cumin1003: dbctl commit (dc=all): 'db1244 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83844 and previous config saved to /var/cache/conftool/dbconfig/20251014-060712-root.json
  • 05:59 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83843 and previous config saved to /var/cache/conftool/dbconfig/20251014-055938-root.json
  • 05:55 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83842 and previous config saved to /var/cache/conftool/dbconfig/20251014-055520-root.json
  • 05:53 marostegui@cumin1003: START - Cookbook sre.mysql.depool es1032 - Depool es1032.eqiad.wmnet to then clone it to es1055.eqiad.wmnet - marostegui@cumin1003
  • 05:53 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1032.eqiad.wmnet onto es1055.eqiad.wmnet
  • 05:52 marostegui@cumin1003: dbctl commit (dc=all): 'db1244 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83840 and previous config saved to /var/cache/conftool/dbconfig/20251014-055206-root.json
  • 05:46 marostegui@cumin1003: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83839 and previous config saved to /var/cache/conftool/dbconfig/20251014-054631-root.json
  • 05:44 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83838 and previous config saved to /var/cache/conftool/dbconfig/20251014-054432-root.json
  • 05:43 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 05:42 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1244 T407176', diff saved to https://phabricator.wikimedia.org/P83837 and previous config saved to /var/cache/conftool/dbconfig/20251014-054200-marostegui.json
  • 05:41 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1160 to s4 primary T407176', diff saved to https://phabricator.wikimedia.org/P83836 and previous config saved to /var/cache/conftool/dbconfig/20251014-054118-marostegui.json
  • 05:41 marostegui: Starting s4 eqiad failover from db1244 to db1160 - T407176
  • 05:40 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83835 and previous config saved to /var/cache/conftool/dbconfig/20251014-054014-root.json
  • 05:37 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T407176
  • 05:36 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1160 with weight 0 T407176', diff saved to https://phabricator.wikimedia.org/P83834 and previous config saved to /var/cache/conftool/dbconfig/20251014-053654-marostegui.json
  • 05:31 marostegui@cumin1003: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83833 and previous config saved to /var/cache/conftool/dbconfig/20251014-053125-root.json
  • 05:29 marostegui@cumin1003: dbctl commit (dc=all): 'es1053 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83832 and previous config saved to /var/cache/conftool/dbconfig/20251014-052926-root.json
  • 05:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es1031 - Depool es1031.eqiad.wmnet to then clone it to es1054.eqiad.wmnet - marostegui@cumin1003
  • 05:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1033.eqiad.wmnet onto es1056.eqiad.wmnet
  • 05:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1033 gradually with 4 steps - Pool es1033.eqiad.wmnet in after cloning
  • 05:25 marostegui@cumin1003: dbctl commit (dc=all): 'es1050 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83830 and previous config saved to /var/cache/conftool/dbconfig/20251014-052508-root.json
  • 05:20 marostegui@cumin1003: START - Cookbook sre.mysql.depool es1031 - Depool es1031.eqiad.wmnet to then clone it to es1054.eqiad.wmnet - marostegui@cumin1003
  • 05:20 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1031.eqiad.wmnet onto es1054.eqiad.wmnet
  • 05:16 marostegui@cumin1003: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83828 and previous config saved to /var/cache/conftool/dbconfig/20251014-051619-root.json
  • 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1031-1032].eqiad.wmnet with reason: Cloning
  • 05:01 marostegui@cumin1003: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83826 and previous config saved to /var/cache/conftool/dbconfig/20251014-050113-root.json
  • 04:53 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1221 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83824 and previous config saved to /var/cache/conftool/dbconfig/20251014-045305-marostegui.json
  • 04:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 04:52 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 14 hosts with reason: Upgrading
  • 04:41 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1033 gradually with 4 steps - Pool es1033.eqiad.wmnet in after cloning
  • 04:02 mwpresync@deploy2002: Pruned MediaWiki: 1.45.0-wmf.20 (duration: 02m 42s)
  • 03:48 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.45.0-wmf.23 refs T405679 (duration: 45m 02s)
  • 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.45.0-wmf.23 refs T405679
  • 02:24 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
  • 02:20 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
  • 02:09 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
  • 02:05 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
  • 01:58 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
  • 01:52 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
  • 01:45 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
  • 01:39 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 20s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-13

  • 23:50 musikanimal@deploy2002: Finished scap sync-world: Backport for Add 'accepted' status (T406674) (duration: 40m 01s)
  • 23:38 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 23:36 musikanimal@deploy2002: musikanimal: Backport for Add 'accepted' status (T406674) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:29 btullis@cumin1003: END (PASS) - Cookbook sre.presto.reboot-workers (exit_code=0) for Presto an-presto cluster: Reboot Presto nodes
  • 23:10 musikanimal@deploy2002: Started scap sync-world: Backport for Add 'accepted' status (T406674)
  • 22:34 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2003.codfw.wmnet
  • 22:30 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host kafkamon2003.codfw.wmnet
  • 22:01 btullis@cumin1003: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
  • 22:01 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 22:01 btullis@cumin1003: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
  • 22:00 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 21:52 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1003.eqiad.wmnet
  • 21:48 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
  • 21:05 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 21:05 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2004.codfw.wmnet
  • 21:03 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 20:57 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
  • 20:56 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1005.eqiad.wmnet
  • 20:52 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host graphite1005.eqiad.wmnet
  • 20:52 btullis@cumin1003: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
  • 20:45 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
  • 20:39 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
  • 20:34 eileen: civicrm upgraded from 385f00d8 to 9393addf
  • 20:25 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
  • 20:22 dani@deploy2002: Finished scap sync-world: Backport for Undeploy Design Research participant recruitment survey on jawiki (T405577) (duration: 09m 01s)
  • 20:19 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
  • 20:18 dani@deploy2002: dani: Continuing with sync
  • 20:17 dani@deploy2002: dani: Backport for Undeploy Design Research participant recruitment survey on jawiki (T405577) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:13 dani@deploy2002: Started scap sync-world: Backport for Undeploy Design Research participant recruitment survey on jawiki (T405577)
  • 19:44 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1027.eqiad.wmnet onto es1050.eqiad.wmnet
  • 19:44 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1027 gradually with 4 steps - Pool es1027.eqiad.wmnet in after cloning
  • 18:59 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1027 gradually with 4 steps - Pool es1027.eqiad.wmnet in after cloning
  • 17:59 fceratto@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db-test2001.codfw.wmnet
  • 17:59 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db-test2001.codfw.wmnet with OS trixie
  • 17:43 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db-test2001.codfw.wmnet with reason: host reimage
  • 17:37 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db-test2001.codfw.wmnet with reason: host reimage
  • 17:19 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db-test2001.codfw.wmnet with OS trixie
  • 17:19 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test2001.codfw.wmnet - fceratto@cumin1002"
  • 17:19 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db-test2001.codfw.wmnet - fceratto@cumin1002"
  • 17:18 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db-test2001.codfw.wmnet on all recursors
  • 17:18 fceratto@cumin1002: START - Cookbook sre.dns.wipe-cache db-test2001.codfw.wmnet on all recursors
  • 17:18 fceratto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:18 fceratto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test2001.codfw.wmnet - fceratto@cumin1002"
  • 17:17 fceratto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db-test2001.codfw.wmnet - fceratto@cumin1002"
  • 17:14 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 17:14 fceratto@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:11 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-eqiad
  • 17:11 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 17:11 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test2001.codfw.wmnet
  • 17:10 fceratto@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host db-test1001.eqiad.wmnet
  • 17:10 fceratto@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:08 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 17:02 fceratto@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:59 fceratto@cumin1002: START - Cookbook sre.dns.netbox
  • 16:59 fceratto@cumin1002: START - Cookbook sre.ganeti.makevm for new host db-test1001.eqiad.wmnet
  • 16:05 volans@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
  • 15:59 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
  • 15:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
  • 15:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
  • 15:51 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
  • 15:50 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 15:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
  • 15:47 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
  • 15:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
  • 15:46 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
  • 15:42 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
  • 15:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
  • 15:39 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
  • 15:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
  • 15:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
  • 15:31 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
  • 15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
  • 15:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
  • 15:24 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
  • 15:24 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
  • 15:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
  • 15:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
  • 15:17 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
  • 15:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
  • 15:16 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
  • 15:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
  • 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd1005.eqiad.wmnet
  • 15:12 btullis@cumin1003: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
  • 15:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd1005.eqiad.wmnet
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd1004.eqiad.wmnet
  • 15:09 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
  • 15:09 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
  • 15:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd1004.eqiad.wmnet
  • 15:05 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
  • 15:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
  • 14:57 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
  • 14:57 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd1003.eqiad.wmnet
  • 14:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd1003.eqiad.wmnet
  • 14:49 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
  • 14:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
  • 14:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
  • 14:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
  • 14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
  • 14:20 hnowlan: rest.php on rest-gateway at 100% for enwiki (and all other wikis)
  • 14:19 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
  • 14:15 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-eqiad
  • 14:14 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
  • 14:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 14:13 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
  • 14:13 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
  • 14:07 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
  • 14:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 14:06 fceratto@cumin1002: START - Cookbook sre.mysql.major-upgrade
  • 14:04 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2057.codfw.wmnet
  • 14:04 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 14:03 btullis@cumin1003: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
  • 13:58 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 13:49 btullis@cumin1003: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
  • 13:46 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 13:43 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:40 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:40 phuedx: UTC afternoon backport window done
  • 13:37 phuedx@deploy2002: Finished scap sync-world: Backport for Port Java Pageview definition to bot detection (T406359) (duration: 17m 39s)
  • 13:34 btullis@cumin1003: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
  • 13:33 phuedx@deploy2002: phuedx: Continuing with sync
  • 13:33 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
  • 13:31 fceratto@cumin1002: START - Cookbook sre.mysql.major-upgrade
  • 13:31 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.major-upgrade (exit_code=99)
  • 13:30 fceratto@cumin1002: START - Cookbook sre.mysql.major-upgrade
  • 13:26 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:24 jmm@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:24 phuedx@deploy2002: phuedx: Backport for Port Java Pageview definition to bot detection (T406359) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:20 phuedx@deploy2002: Started scap sync-world: Backport for Port Java Pageview definition to bot detection (T406359)
  • 13:15 derick@deploy2002: Finished scap sync-world: Backport for session: Enable MultiBackendSessionStore on `group2` wikis (T402808) (duration: 11m 39s)
  • 13:11 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
  • 13:11 derick@deploy2002: derick, d3r1ck01: Continuing with sync
  • 13:09 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
  • 13:09 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
  • 13:08 derick@deploy2002: derick, d3r1ck01: Backport for session: Enable MultiBackendSessionStore on `group2` wikis (T402808) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:06 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet
  • 13:06 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
  • 13:06 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet
  • 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
  • 13:04 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet
  • 13:04 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
  • 13:04 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
  • 13:03 derick@deploy2002: Started scap sync-world: Backport for session: Enable MultiBackendSessionStore on `group2` wikis (T402808)
  • 13:01 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
  • 13:01 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
  • 12:59 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
  • 12:59 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
  • 12:57 Amir1: dropped flaggedrevs tables on lawikisource (fT406424)
  • 12:57 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
  • 12:56 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
  • 12:56 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
  • 12:54 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
  • 12:53 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
  • 12:51 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
  • 12:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
  • 12:50 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1003.eqiad.wmnet
  • 12:47 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83815 and previous config saved to /var/cache/conftool/dbconfig/20251013-124744-root.json
  • 12:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
  • 12:46 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
  • 12:45 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache1003.eqiad.wmnet
  • 12:45 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1002.eqiad.wmnet
  • 12:44 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 100%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83814 and previous config saved to /var/cache/conftool/dbconfig/20251013-124439-root.json
  • 12:41 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
  • 12:41 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
  • 12:40 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache1002.eqiad.wmnet
  • 12:40 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache1001.eqiad.wmnet
  • 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
  • 12:35 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
  • 12:35 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-cache1001.eqiad.wmnet
  • 12:32 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83813 and previous config saved to /var/cache/conftool/dbconfig/20251013-123238-root.json
  • 12:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
  • 12:29 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 75%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83812 and previous config saved to /var/cache/conftool/dbconfig/20251013-122933-root.json
  • 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki1002.eqiad.wmnet
  • 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki1002.eqiad.wmnet
  • 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mirror1001.wikimedia.org
  • 12:17 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83811 and previous config saved to /var/cache/conftool/dbconfig/20251013-121732-root.json
  • 12:16 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
  • 12:14 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 60%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83810 and previous config saved to /var/cache/conftool/dbconfig/20251013-121427-root.json
  • 12:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mirror1001.wikimedia.org
  • 12:02 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83809 and previous config saved to /var/cache/conftool/dbconfig/20251013-120226-root.json
  • 11:59 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 50%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83808 and previous config saved to /var/cache/conftool/dbconfig/20251013-115921-root.json
  • 11:47 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83807 and previous config saved to /var/cache/conftool/dbconfig/20251013-114720-root.json
  • 11:45 fceratto@cumin1002: START - Cookbook sre.mysql.major-upgrade
  • 11:44 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 30%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83806 and previous config saved to /var/cache/conftool/dbconfig/20251013-114415-root.json
  • 11:35 marostegui@cumin1003: dbctl commit (dc=all): 'db1247 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83805 and previous config saved to /var/cache/conftool/dbconfig/20251013-113510-root.json
  • 11:33 gehel: restarting blazegraph on wdqs1014 (BlazegraphFreeAllocatorsDecreasingRapidly) - `sudo depool && sleep 30 && sudo systemctl restart wdqs-blazegraph.service && sleep 30 && sudo pool`
  • 11:32 moritzm: installing openssl security updates on Bullseye
  • 11:32 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83804 and previous config saved to /var/cache/conftool/dbconfig/20251013-113214-root.json
  • 11:29 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 25%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83803 and previous config saved to /var/cache/conftool/dbconfig/20251013-112909-root.json
  • 11:20 marostegui@cumin1003: dbctl commit (dc=all): 'db1247 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83802 and previous config saved to /var/cache/conftool/dbconfig/20251013-112004-root.json
  • 11:17 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83801 and previous config saved to /var/cache/conftool/dbconfig/20251013-111708-root.json
  • 11:14 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 20%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83800 and previous config saved to /var/cache/conftool/dbconfig/20251013-111403-root.json
  • 11:04 marostegui@cumin1003: dbctl commit (dc=all): 'db1247 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83799 and previous config saved to /var/cache/conftool/dbconfig/20251013-110458-root.json
  • 11:02 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83798 and previous config saved to /var/cache/conftool/dbconfig/20251013-110203-root.json
  • 10:58 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 10%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83797 and previous config saved to /var/cache/conftool/dbconfig/20251013-105857-root.json
  • 10:49 marostegui@cumin1003: dbctl commit (dc=all): 'db1247 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83796 and previous config saved to /var/cache/conftool/dbconfig/20251013-104952-root.json
  • 10:49 moritzm: installing systemd bugfix updates on bullseye
  • 10:46 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83795 and previous config saved to /var/cache/conftool/dbconfig/20251013-104657-root.json
  • 10:43 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 7%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83794 and previous config saved to /var/cache/conftool/dbconfig/20251013-104351-root.json
  • 10:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1247 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83793 and previous config saved to /var/cache/conftool/dbconfig/20251013-104131-marostegui.json
  • 10:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83792 and previous config saved to /var/cache/conftool/dbconfig/20251013-103151-root.json
  • 10:28 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 5%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83791 and previous config saved to /var/cache/conftool/dbconfig/20251013-102845-root.json
  • 10:24 marostegui@cumin1003: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83790 and previous config saved to /var/cache/conftool/dbconfig/20251013-102428-root.json
  • 10:16 marostegui@cumin1003: dbctl commit (dc=all): 'es1051 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83789 and previous config saved to /var/cache/conftool/dbconfig/20251013-101645-root.json
  • 10:13 marostegui@cumin1003: dbctl commit (dc=all): 'es1049 (re)pooling @ 1%: Host provisioned T406488', diff saved to https://phabricator.wikimedia.org/P83788 and previous config saved to /var/cache/conftool/dbconfig/20251013-101339-root.json
  • 10:09 marostegui@cumin1003: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83787 and previous config saved to /var/cache/conftool/dbconfig/20251013-100923-root.json
  • 10:08 hashar@deploy2002: Finished deploy [gerrit/gerrit@93bde2a]: Fix link to task in the motd banner (duration: 00m 13s)
  • 10:08 hashar@deploy2002: Started deploy [gerrit/gerrit@93bde2a]: Fix link to task in the motd banner
  • 10:03 moritzm: installing Linux 5.10.244 on Bullseye hosts
  • 09:54 marostegui@cumin1003: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83786 and previous config saved to /var/cache/conftool/dbconfig/20251013-095416-root.json
  • 09:39 marostegui@cumin1003: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83785 and previous config saved to /var/cache/conftool/dbconfig/20251013-093910-root.json
  • 09:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: Cloning
  • 09:29 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1160 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83784 and previous config saved to /var/cache/conftool/dbconfig/20251013-092903-marostegui.json
  • 09:21 marostegui@cumin1003: dbctl commit (dc=all): 'db1190 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83783 and previous config saved to /var/cache/conftool/dbconfig/20251013-092152-root.json
  • 09:15 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 09:11 kostajh: UTC morning deploys done
  • 09:10 kharlan@deploy2002: Finished scap sync-world: Backport for ext.confirmEdit.hCaptcha.utils: Track hCaptcha execution rejections (T406925) (duration: 09m 19s)
  • 09:06 marostegui@cumin1003: dbctl commit (dc=all): 'db1190 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83782 and previous config saved to /var/cache/conftool/dbconfig/20251013-090647-root.json
  • 09:06 kharlan@deploy2002: kharlan: Continuing with sync
  • 09:05 kharlan@deploy2002: kharlan: Backport for ext.confirmEdit.hCaptcha.utils: Track hCaptcha execution rejections (T406925) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:01 kharlan@deploy2002: Started scap sync-world: Backport for ext.confirmEdit.hCaptcha.utils: Track hCaptcha execution rejections (T406925)
  • 08:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
  • 08:10 kharlan@deploy2002: kharlan: Continuing with sync
  • 08:09 kharlan@deploy2002: kharlan: Backport for Fix locally failing QUnit tests (T406615) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:08 marostegui@cumin1003: dbctl commit (dc=all): 'db1199 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83776 and previous config saved to /var/cache/conftool/dbconfig/20251013-080837-root.json
  • 08:04 kharlan@deploy2002: Started scap sync-world: Backport for Fix locally failing QUnit tests (T406615)
  • 08:04 kharlan@deploy2002: Finished scap sync-world: Backport for kowikisource: Add "해석" namespace (T406405), kowiki: Restrict move ratelimit for non-extendedconfirmed users (T406849), wmgMonologChannels: Set CheckUser to info level, hCaptcha: Enable on testwiki (T402366), NetworkSession: enable only for private wikis (duration
  • 07:57 kharlan@deploy2002: revi, kharlan, dcausse: Continuing with sync
  • 07:55 kharlan@deploy2002: revi, kharlan, dcausse: Backport for kowikisource: Add "해석" namespace (T406405), kowiki: Restrict move ratelimit for non-extendedconfirmed users (T406849), wmgMonologChannels: Set CheckUser to info level, hCaptcha: Enable on testwiki (T402366), NetworkSession: enable only for private wikis synced to t
  • 07:53 marostegui@cumin1003: dbctl commit (dc=all): 'db1199 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83773 and previous config saved to /var/cache/conftool/dbconfig/20251013-075331-root.json
  • 07:49 kharlan@deploy2002: Started scap sync-world: Backport for kowikisource: Add "해석" namespace (T406405), kowiki: Restrict move ratelimit for non-extendedconfirmed users (T406849), wmgMonologChannels: Set CheckUser to info level, hCaptcha: Enable on testwiki (T402366), NetworkSession: enable only for private wikis
  • 07:46 mszwarc@deploy2002: Finished scap sync-world: Backport for arbcom_plwiki: Change favicon (T406883) (duration: 37m 46s)
  • 07:38 marostegui@cumin1003: dbctl commit (dc=all): 'db1199 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83772 and previous config saved to /var/cache/conftool/dbconfig/20251013-073825-root.json
  • 07:33 mszwarc@deploy2002: mszwarc: Continuing with sync
  • 07:33 mszwarc@deploy2002: mszwarc: Backport for arbcom_plwiki: Change favicon (T406883) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:23 marostegui@cumin1003: dbctl commit (dc=all): 'db1199 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83771 and previous config saved to /var/cache/conftool/dbconfig/20251013-072320-root.json
  • 07:15 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1199 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83770 and previous config saved to /var/cache/conftool/dbconfig/20251013-071521-marostegui.json
  • 07:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 07:08 mszwarc@deploy2002: Started scap sync-world: Backport for arbcom_plwiki: Change favicon (T406883)
  • 06:30 marostegui@cumin1003: dbctl commit (dc=all): 'db1238 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83769 and previous config saved to /var/cache/conftool/dbconfig/20251013-063046-root.json
  • 06:15 marostegui@cumin1003: dbctl commit (dc=all): 'db1238 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83768 and previous config saved to /var/cache/conftool/dbconfig/20251013-061540-root.json
  • 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'db1238 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83767 and previous config saved to /var/cache/conftool/dbconfig/20251013-060034-root.json
  • 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'db1241 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83766 and previous config saved to /var/cache/conftool/dbconfig/20251013-054551-root.json
  • 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'db1238 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83765 and previous config saved to /var/cache/conftool/dbconfig/20251013-054528-root.json
  • 05:37 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1238 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83764 and previous config saved to /var/cache/conftool/dbconfig/20251013-053723-marostegui.json
  • 05:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 05:30 marostegui@cumin1003: dbctl commit (dc=all): 'db1241 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83763 and previous config saved to /var/cache/conftool/dbconfig/20251013-053045-root.json
  • 05:20 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es1033 - Depool es1033.eqiad.wmnet to then clone it to es1056.eqiad.wmnet - marostegui@cumin1003
  • 05:15 marostegui@cumin1003: dbctl commit (dc=all): 'db1241 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83762 and previous config saved to /var/cache/conftool/dbconfig/20251013-051540-root.json
  • 05:06 marostegui@cumin1003: START - Cookbook sre.mysql.depool es1033 - Depool es1033.eqiad.wmnet to then clone it to es1056.eqiad.wmnet - marostegui@cumin1003
  • 05:06 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1033.eqiad.wmnet onto es1056.eqiad.wmnet
  • 05:00 marostegui@cumin1003: dbctl commit (dc=all): 'db1241 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83760 and previous config saved to /var/cache/conftool/dbconfig/20251013-050034-root.json
  • 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1241 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83759 and previous config saved to /var/cache/conftool/dbconfig/20251013-045230-marostegui.json
  • 04:52 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 04:49 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es1027 - Depool es1027.eqiad.wmnet to then clone it to es1050.eqiad.wmnet - marostegui@cumin1003
  • 04:49 marostegui@cumin1003: START - Cookbook sre.mysql.depool es1027 - Depool es1027.eqiad.wmnet to then clone it to es1050.eqiad.wmnet - marostegui@cumin1003
  • 04:49 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1027.eqiad.wmnet onto es1050.eqiad.wmnet
  • 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1027,1050].eqiad.wmnet with reason: Cloning
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 25s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-12

  • 01:01 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 01m 09s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-11

  • 12:34 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2005-dev.codfw.wmnet with OS trixie
  • 09:35 hashar@deploy2002: Finished deploy [integration/docroot@99ef7e9]: build: Update phpunit/phpunit to 10.5.58 (duration: 00m 11s)
  • 09:35 hashar@deploy2002: Started deploy [integration/docroot@99ef7e9]: build: Update phpunit/phpunit to 10.5.58
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 25s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-10

  • 21:16 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 21:16 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 21:00 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS trixie
  • 20:57 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
  • 17:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 16:50 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 16:50 rzl@deploy1003: helmfile [staging] START helmfile.d/services/zotero: apply
  • 16:49 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 16:49 rzl@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 16:49 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 16:48 rzl@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 16:48 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 16:47 rzl@deploy1003: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 16:46 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:46 rzl@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:46 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 16:46 rzl@deploy1003: helmfile [staging] START helmfile.d/services/termbox: apply
  • 16:45 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 16:45 rzl@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 16:43 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 16:43 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 16:43 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:42 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 16:41 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:41 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:40 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 16:40 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 16:39 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:39 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 16:39 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 16:38 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 16:38 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 16:37 rzl@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 16:37 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 16:37 rzl@deploy1003: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 16:37 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 16:36 rzl@deploy1003: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 16:36 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 16:36 rzl@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply
  • 16:36 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 16:36 rzl@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 16:35 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:35 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:35 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 16:34 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 16:33 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:31 rzl@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:27 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 16:27 rzl@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 16:27 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 16:27 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 16:26 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 16:23 rzl@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 16:19 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 16:19 rzl@deploy1003: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 16:16 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 16:16 rzl@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 16:15 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 16:15 rzl@deploy1003: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 16:15 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 16:14 rzl@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 16:14 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 16:14 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 16:14 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 16:13 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 16:11 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 16:11 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 16:10 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 16:10 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 16:09 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:09 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:09 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:08 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 16:08 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 16:08 rzl@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 16:07 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 16:07 rzl@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 16:06 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 16:06 rzl@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply
  • 16:05 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 16:05 rzl@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 16:04 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 16:04 rzl@deploy1003: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 16:04 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 16:03 rzl@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 16:03 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 16:03 rzl@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 16:02 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 16:02 rzl@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 16:00 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 16:00 rzl@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 15:59 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:58 rzl@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:56 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:56 rzl@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:39 btullis@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker-codfw
  • 15:10 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker-codfw
  • 14:45 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:13 marostegui@cumin1003: dbctl commit (dc=all): 'db1242 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83756 and previous config saved to /var/cache/conftool/dbconfig/20251010-141326-root.json
  • 14:06 elukey@cumin1003: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for sretest2001.codfw.wmnet: Renew puppet certificate - elukey@cumin1003
  • 14:03 bking@dns1004: END - running authdns-update
  • 14:02 bking@dns1004: START - running authdns-update
  • 13:58 marostegui@cumin1003: dbctl commit (dc=all): 'db1242 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83755 and previous config saved to /var/cache/conftool/dbconfig/20251010-135820-root.json
  • 13:56 ejegg: donorwiki upgraded from 73c34ea4 to d903982c
  • 13:43 marostegui@cumin1003: dbctl commit (dc=all): 'db1242 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83754 and previous config saved to /var/cache/conftool/dbconfig/20251010-134314-root.json
  • 13:28 marostegui@cumin1003: dbctl commit (dc=all): 'db1242 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83753 and previous config saved to /var/cache/conftool/dbconfig/20251010-132808-root.json
  • 13:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1242 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83752 and previous config saved to /var/cache/conftool/dbconfig/20251010-132003-marostegui.json
  • 13:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 13:17 fabfur: revert haproxykafka to v0.3.16 on cp5021 and cp7001 (T404427)
  • 12:06 marostegui@cumin1003: dbctl commit (dc=all): 'db1243 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83750 and previous config saved to /var/cache/conftool/dbconfig/20251010-120643-root.json
  • 11:51 marostegui@cumin1003: dbctl commit (dc=all): 'db1243 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83749 and previous config saved to /var/cache/conftool/dbconfig/20251010-115138-root.json
  • 11:36 marostegui@cumin1003: dbctl commit (dc=all): 'db1243 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83748 and previous config saved to /var/cache/conftool/dbconfig/20251010-113632-root.json
  • 11:21 marostegui@cumin1003: dbctl commit (dc=all): 'db1243 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83747 and previous config saved to /var/cache/conftool/dbconfig/20251010-112126-root.json
  • 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Change es2 eqiad master to es1030 T406488', diff saved to https://phabricator.wikimedia.org/P83746 and previous config saved to /var/cache/conftool/dbconfig/20251010-111653-marostegui.json
  • 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Change es1 eqiad master to es1029 T406488', diff saved to https://phabricator.wikimedia.org/P83745 and previous config saved to /var/cache/conftool/dbconfig/20251010-111630-marostegui.json
  • 11:16 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Change es3 eqiad master to es1028 T406488', diff saved to https://phabricator.wikimedia.org/P83744 and previous config saved to /var/cache/conftool/dbconfig/20251010-111605-marostegui.json
  • 11:15 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 11:15 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:14 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 11:13 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:13 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 11:13 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1243 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83743 and previous config saved to /var/cache/conftool/dbconfig/20251010-111306-marostegui.json
  • 11:13 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 11:10 marostegui@cumin1003: dbctl commit (dc=all): 'db1248 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83742 and previous config saved to /var/cache/conftool/dbconfig/20251010-111020-root.json
  • 10:55 marostegui@cumin1003: dbctl commit (dc=all): 'db1248 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83741 and previous config saved to /var/cache/conftool/dbconfig/20251010-105514-root.json
  • 10:40 marostegui@cumin1003: dbctl commit (dc=all): 'db1248 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83740 and previous config saved to /var/cache/conftool/dbconfig/20251010-104008-root.json
  • 10:33 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:32 vgutierrez: restarting acme-chief and nginx on acme-chief instances
  • 10:25 marostegui@cumin1003: dbctl commit (dc=all): 'db1248 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83739 and previous config saved to /var/cache/conftool/dbconfig/20251010-102502-root.json
  • 10:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1248 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83738 and previous config saved to /var/cache/conftool/dbconfig/20251010-101720-marostegui.json
  • 10:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 09:34 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 09:34 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 09:20 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 06:24 marostegui@cumin1003: dbctl commit (dc=all): 'db1249 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83737 and previous config saved to /var/cache/conftool/dbconfig/20251010-062406-root.json
  • 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1029.eqiad.wmnet onto es1052.eqiad.wmnet
  • 06:10 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1029 gradually with 4 steps - Pool es1029.eqiad.wmnet in after cloning
  • 06:10 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1034.eqiad.wmnet onto es1057.eqiad.wmnet
  • 06:10 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1034 gradually with 4 steps - Pool es1034.eqiad.wmnet in after cloning
  • 06:09 marostegui@cumin1003: dbctl commit (dc=all): 'db1249 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83734 and previous config saved to /var/cache/conftool/dbconfig/20251010-060900-root.json
  • 05:53 marostegui@cumin1003: dbctl commit (dc=all): 'db1249 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83731 and previous config saved to /var/cache/conftool/dbconfig/20251010-055354-root.json
  • 05:38 marostegui@cumin1003: dbctl commit (dc=all): 'db1249 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83728 and previous config saved to /var/cache/conftool/dbconfig/20251010-053848-root.json
  • 05:30 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1249 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83727 and previous config saved to /var/cache/conftool/dbconfig/20251010-053040-marostegui.json
  • 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1029 gradually with 4 steps - Pool es1029.eqiad.wmnet in after cloning
  • 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1034 gradually with 4 steps - Pool es1034.eqiad.wmnet in after cloning
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 32s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-09

  • 23:10 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2017.*
  • 22:11 inflatador: bking@wdqs10(18|19|20) systemctl start load-categories-daily.service T405978
  • 22:05 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1019.eqiad.wmnet
  • 22:04 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1020.eqiad.wmnet
  • 22:04 jdlrobson@deploy2002: Finished scap sync-world: Backport for Enable instrumentation of watchstar and other links that stopPropagation (T406390) (duration: 41m 38s)
  • 22:00 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1018.eqiad.wmnet
  • 21:51 dwisehaupt: started staging db restore in root screen session on frdb1006. restoring from db backups on 20251008
  • 21:51 jdlrobson@deploy2002: jdlrobson: Continuing with sync
  • 21:47 jdlrobson@deploy2002: jdlrobson: Backport for Enable instrumentation of watchstar and other links that stopPropagation (T406390) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:25 TimStarling: on db2202 cleaned up the tables I created for T400696
  • 21:22 jdlrobson@deploy2002: Started scap sync-world: Backport for Enable instrumentation of watchstar and other links that stopPropagation (T406390)
  • 21:20 wfan: payments-wiki upgraded from 028a0225 to d903982c
  • 20:58 reedy@deploy2002: Finished scap sync-world: Backport for Enable New UI and Multiple Module support for OATHAuth in Wikimedia production (T399644) (duration: 20m 04s)
  • 20:53 reedy@deploy2002: reedy, sbassett: Continuing with sync
  • 20:46 Daimona: Run createAndPromote as in P83722#336349 (~100x, in series) to restore event-organizer membership # T401445
  • 20:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 20:42 reedy@deploy2002: reedy, sbassett: Backport for Enable New UI and Multiple Module support for OATHAuth in Wikimedia production (T399644) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:38 reedy@deploy2002: Started scap sync-world: Backport for Enable New UI and Multiple Module support for OATHAuth in Wikimedia production (T399644)
  • 20:32 mutante: logmsgbot do you still log - test log T284123
  • 20:29 mutante: re-enabled QoS on gerrit servers - with previously stable config - T406774 gerrit:1194811
  • 20:28 reedy@deploy2002: Finished scap sync-world: Backport for OATHAuth Recovery Code code improvement (T406501) (duration: 10m 19s)
  • 20:25 mutante: re-enabling QoS on gerrit servers - with previously stable config - T406774
  • 20:24 reedy@deploy2002: sbassett, reedy: Continuing with sync
  • 20:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker2003.codfw.wmnet with reason: host reimage
  • 20:23 reedy@deploy2002: sbassett, reedy: Backport for OATHAuth Recovery Code code improvement (T406501) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker2003.codfw.wmnet with reason: host reimage
  • 20:18 reedy@deploy2002: Started scap sync-world: Backport for OATHAuth Recovery Code code improvement (T406501)
  • 20:17 reedy@deploy2002: Finished scap sync-world: Backport for Update interwiki cache, Revert "Delete the event-organizer user group on medium and small wikis" (T401445), Assign campaignevents-generate-invitation-lists right explicitly (T401445) (duration: 10m 46s)
  • 20:13 reedy@deploy2002: daimona, reedy: Continuing with sync
  • 20:11 reedy@deploy2002: daimona, reedy: Backport for Update interwiki cache, Revert "Delete the event-organizer user group on medium and small wikis" (T401445), Assign campaignevents-generate-invitation-lists right explicitly (T401445) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:06 reedy@deploy2002: Started scap sync-world: Backport for Update interwiki cache, Revert "Delete the event-organizer user group on medium and small wikis" (T401445), Assign campaignevents-generate-invitation-lists right explicitly (T401445)
  • 20:04 bking@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 20:00 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1020.eqiad.wmnet
  • 19:59 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1019.eqiad.wmnet
  • 19:59 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1018.eqiad.wmnet
  • 19:29 eileen: civicrm upgraded from 14cc3125 to 748922f0
  • 19:22 ejegg: donorwiki upgraded from e8ef5539 to 73c34ea4
  • 19:13 ejegg: civicrm upgraded from 132211d5 to 14cc3125
  • 19:04 jforrester@deploy2002: Finished scap sync-world: Backport for i18n: Pull forward wikimedia-boardelection2025-notification-body updates (duration: 11m 39s)
  • 18:59 jforrester@deploy2002: jforrester: Continuing with sync
  • 18:58 jforrester@deploy2002: jforrester: Backport for i18n: Pull forward wikimedia-boardelection2025-notification-body updates synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:53 jforrester@deploy2002: Started scap sync-world: Backport for i18n: Pull forward wikimedia-boardelection2025-notification-body updates
  • 18:36 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
  • 18:36 cmooney@cumin1003: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
  • 18:02 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 18:02 rzl@deploy1003: helmfile [staging] START helmfile.d/services/apertium: apply
  • 17:31 topranks: begin work to move lvs1020 uplink cable from ssw1-f1-eqiad to ssw1-e1-eqiad
  • 17:30 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs1020.eqiad.wmnet with reason: downtime lvs1020 to supress alerts about enp94s0f0np0 going down and losing backend connectivity
  • 17:08 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:06 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:06 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:05 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:04 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for inter.link transit IPs in drmrs - cmooney@cumin1003"
  • 16:47 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for inter.link transit IPs in drmrs - cmooney@cumin1003"
  • 16:38 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 16:33 cwhite: upgrade grafana-loki on grafana hosts T406478
  • 16:30 tgr@deploy2002: Finished scap sync-world: Backport for session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634), session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634) (duration: 20m 07s)
  • 16:26 tgr@deploy2002: tgr, d3r1ck01: Continuing with sync
  • 16:18 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 16:18 sukhe: sukhe@lvs2013:~$ sudo systemctl restart pybal.service
  • 16:14 tgr@deploy2002: tgr, d3r1ck01: Backport for session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634), session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:10 tgr@deploy2002: Started scap sync-world: Backport for session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634), session: Improve logging for MultiBackendSessionStore (T402808 T405633 T405634)
  • 15:59 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:57 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:56 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:48 sukhe@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: cluster=proxoid,name=hcaptcha.* [reason: setting weight for proxoid hcaptcha dedicated VM]
  • 15:48 sukhe@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: cluster=proxoid,name=hcatpcha.* [reason: setting weight for proxoid hcaptcha dedicated VM]
  • 15:26 sukhe: sukhe@lvs1019:~$ sudo systemctl restart pybal.service
  • 15:25 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 14:48 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha2002.wikimedia.org with OS bookworm
  • 14:47 sukhe: restart pybal on lvs1020
  • 14:44 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1002.wikimedia.org with OS bookworm
  • 14:42 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
  • 14:42 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:39 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1003.eqiad.wmnet with OS bullseye
  • 14:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha2001.wikimedia.org with OS bookworm
  • 14:37 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 14:36 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 14:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 14:36 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 14:35 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1001.wikimedia.org with OS bookworm
  • 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:34 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:31 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha2002.wikimedia.org with reason: host reimage
  • 14:29 hnowlan: rest.php group2-except-enwiki on rest-gateway at 10%
  • 14:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1002.wikimedia.org with reason: host reimage
  • 14:26 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:23 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha2002.wikimedia.org with reason: host reimage
  • 14:21 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha2001.wikimedia.org with reason: host reimage
  • 14:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 14:18 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha2001.wikimedia.org with reason: host reimage
  • 14:17 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1002.wikimedia.org with reason: host reimage
  • 14:12 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 14:12 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Delete the event-organizer user group on medium and small wikis (T401445) (duration: 14m 47s)
  • 14:08 sukhe: restart pybal on lvs1020 to pick up WDQS changes
  • 14:05 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1050.eqiad.wmnet
  • 14:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
  • 14:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha2002.wikimedia.org with OS bookworm
  • 14:03 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1002.wikimedia.org with OS bookworm
  • 14:02 Lucas_WMDE: for the record, the `foreachwikiindblist small+medium emptyUserGroup` maintenance script run (for T401445) did *not* work, running the maintenance script separately for small and medium worked better
  • 14:01 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha2001.wikimedia.org with OS bookworm
  • 14:01 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1001.wikimedia.org with OS bookworm
  • 14:00 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: foreachwikiindblist medium emptyUserGroup --create-log '--log-reason=T401445' event-organizer # T401445
  • 14:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for Delete the event-organizer user group on medium and small wikis (T401445) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:59 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1050.eqiad.wmnet
  • 13:56 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
  • 13:56 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: foreachwikiindblist small emptyUserGroup --create-log '--log-reason=T401445' event-organizer # T401445
  • 13:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
  • 13:55 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Delete the event-organizer user group on medium and small wikis (T401445)
  • 13:54 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: foreachwikiindblist small+medium emptyUserGroup --create-log '--log-reason=T401445' event-organizer # T401445
  • 13:53 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 13:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Assign CampaignEvents user rights to autoconfirmed in small and medium wikis (T401445) (duration: 11m 51s)
  • 13:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
  • 13:44 lucaswerkmeister-wmde@deploy2002: daimona, lucaswerkmeister-wmde: Continuing with sync
  • 13:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
  • 13:43 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
  • 13:41 lucaswerkmeister-wmde@deploy2002: daimona, lucaswerkmeister-wmde: Backport for Assign CampaignEvents user rights to autoconfirmed in small and medium wikis (T401445) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:37 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 13:37 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 13:36 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 13:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Assign CampaignEvents user rights to autoconfirmed in small and medium wikis (T401445)
  • 13:36 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 13:34 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:32 esanders@deploy2002: Finished scap sync-world: Backport for Revert "Invalidate Flow cache on enwiktionary" (duration: 08m 29s)
  • 13:32 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 13:28 esanders@deploy2002: esanders: Continuing with sync
  • 13:28 esanders@deploy2002: esanders: Backport for Revert "Invalidate Flow cache on enwiktionary" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:24 esanders@deploy2002: Started scap sync-world: Backport for Revert "Invalidate Flow cache on enwiktionary"
  • 13:24 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 13:21 hashar: Zuul successfully reconnected to Gerrit
  • 13:20 hashar: Closed jenkins-bot connections on Gerrit primary
  • 13:08 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp2005.wikimedia.org
  • 13:08 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp2005.wikimedia.org with OS trixie
  • 13:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
  • 13:06 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2053.codfw.wmnet with reason: Setting up new ES host
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
  • 12:59 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:58 fabfur: enable puppet on A:cp to deploy https://gerrit.wikimedia.org/r/1194676 (T404427)
  • 12:55 arnaudb@dns1004: END - running authdns-update
  • 12:53 arnaudb@dns1004: START - running authdns-update
  • 12:53 arnaudb@dns1004: START - running authdns-update
  • 12:53 arnaudb@dns1004: START - running authdns-update
  • 12:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp2005.wikimedia.org with reason: host reimage
  • 12:47 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 12:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on idp2005.wikimedia.org with reason: host reimage
  • 12:37 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
  • 12:18 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp2005.wikimedia.org with OS trixie
  • 12:18 fabfur: reloading haproxy on A:cp-eqsin (T404427)
  • 12:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp2005.wikimedia.org - slyngshede@cumin1003"
  • 12:18 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp2005.wikimedia.org - slyngshede@cumin1003"
  • 12:17 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp2005.wikimedia.org on all recursors
  • 12:17 slyngshede@cumin1003: START - Cookbook sre.dns.wipe-cache idp2005.wikimedia.org on all recursors
  • 12:17 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:17 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp2005.wikimedia.org - slyngshede@cumin1003"
  • 12:17 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp2005.wikimedia.org - slyngshede@cumin1003"
  • 12:13 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 12:13 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp2005.wikimedia.org
  • 12:10 fabfur: enable puppet on A:cp-eqsin to deploy https://gerrit.wikimedia.org/r/1194676 (T404427)
  • 12:07 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
  • 12:06 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
  • 12:04 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
  • 12:04 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
  • 12:03 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:03 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:03 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:02 arnaudb@dns1004: START - running authdns-update
  • 11:59 moritzm: installing luajit security updates
  • 11:53 fabfur: disable puppet on A:cp to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1194676 on cp5021 (T404427)
  • 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp1005.wikimedia.org
  • 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp1005.wikimedia.org with OS trixie
  • 11:46 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dbprov2007.codfw.wmnet
  • 11:40 jynus@cumin1002: START - Cookbook sre.hosts.reboot-single for host dbprov2007.codfw.wmnet
  • 11:36 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp1005.wikimedia.org with reason: host reimage
  • 11:32 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on idp1005.wikimedia.org with reason: host reimage
  • 11:27 ladsgroup@cumin1003: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
  • 11:21 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
  • 11:21 ladsgroup@cumin1003: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp1005.wikimedia.org with OS trixie
  • 11:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp1005.wikimedia.org - slyngshede@cumin1003"
  • 11:18 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp1005.wikimedia.org - slyngshede@cumin1003"
  • 11:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp1005.wikimedia.org on all recursors
  • 11:18 slyngshede@cumin1003: START - Cookbook sre.dns.wipe-cache idp1005.wikimedia.org on all recursors
  • 11:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:18 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp1005.wikimedia.org - slyngshede@cumin1003"
  • 11:16 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp1005.wikimedia.org - slyngshede@cumin1003"
  • 11:14 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
  • 11:13 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 11:13 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp1005.wikimedia.org
  • 10:58 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:57 moritzm: installing qemu security updates
  • 10:47 cmooney@dns2005: END - running authdns-update
  • 10:46 cmooney@dns2005: START - running authdns-update
  • 10:37 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 10:29 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:29 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:20 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:20 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:17 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:15 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:12 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:11 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:10 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:09 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:09 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 10:08 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 10:08 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:08 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:02 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 10:01 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 10:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'db1252 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83715 and previous config saved to /var/cache/conftool/dbconfig/20251009-095839-root.json
  • 09:44 kharlan@deploy2002: Finished scap sync-world: Backport for Check against correct key in sortEntitiesByTimestamp (T406707) (duration: 11m 18s)
  • 09:43 marostegui@cumin1003: dbctl commit (dc=all): 'db1252 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83713 and previous config saved to /var/cache/conftool/dbconfig/20251009-094333-root.json
  • 09:40 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 09:39 kharlan@deploy2002: kharlan: Continuing with sync
  • 09:39 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 09:38 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:37 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 09:37 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:36 kharlan@deploy2002: kharlan: Backport for Check against correct key in sortEntitiesByTimestamp (T406707) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:36 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:32 kharlan@deploy2002: Started scap sync-world: Backport for Check against correct key in sortEntitiesByTimestamp (T406707)
  • 09:31 marostegui@cumin1003: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83711 and previous config saved to /var/cache/conftool/dbconfig/20251009-093131-root.json
  • 09:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 09:28 marostegui@cumin1003: dbctl commit (dc=all): 'db1252 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83709 and previous config saved to /var/cache/conftool/dbconfig/20251009-092827-root.json
  • 09:24 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 09:23 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2045.codfw.wmnet']
  • 09:23 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2045.codfw.wmnet']
  • 09:21 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 09:21 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 09:16 marostegui@cumin1003: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83708 and previous config saved to /var/cache/conftool/dbconfig/20251009-091626-root.json
  • 09:13 marostegui@cumin1003: dbctl commit (dc=all): 'db1252 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83707 and previous config saved to /var/cache/conftool/dbconfig/20251009-091322-root.json
  • 09:05 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1252 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83706 and previous config saved to /var/cache/conftool/dbconfig/20251009-090516-marostegui.json
  • 09:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1252.eqiad.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1003: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83705 and previous config saved to /var/cache/conftool/dbconfig/20251009-090120-root.json
  • 08:53 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 08:53 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 08:52 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 08:52 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 08:52 elukey@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 08:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 08:46 marostegui@cumin1003: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83704 and previous config saved to /var/cache/conftool/dbconfig/20251009-084614-root.json
  • 08:44 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 08:38 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2179 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83703 and previous config saved to /var/cache/conftool/dbconfig/20251009-083801-marostegui.json
  • 08:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 08:34 marostegui@cumin1003: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83702 and previous config saved to /var/cache/conftool/dbconfig/20251009-083432-root.json
  • 08:26 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 08:26 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 08:22 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.22 refs T405678
  • 08:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:19 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 08:19 marostegui@cumin1003: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83701 and previous config saved to /var/cache/conftool/dbconfig/20251009-081926-root.json
  • 08:19 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 08:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:18 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2043.codfw.wmnet']
  • 08:18 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2043.codfw.wmnet']
  • 08:12 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2044']
  • 08:12 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044']
  • 08:07 kharlan@deploy2002: Finished scap sync-world: Backport for ConfirmEdit/hCaptcha: Implement automatic failover (T404204) (duration: 13m 14s)
  • 08:04 marostegui@cumin1003: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83700 and previous config saved to /var/cache/conftool/dbconfig/20251009-080420-root.json
  • 08:03 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:59 joal@deploy2002: Finished deploy [analytics/refinery@af75327] (thin): Analytics deploy - druid pageviews_daily - THIN [analytics/refinery@af753272] (duration: 02m 10s)
  • 07:59 kharlan@deploy2002: kharlan: Backport for ConfirmEdit/hCaptcha: Implement automatic failover (T404204) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:57 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
  • 07:57 joal@deploy2002: Started deploy [analytics/refinery@af75327] (thin): Analytics deploy - druid pageviews_daily - THIN [analytics/refinery@af753272]
  • 07:56 joal@deploy2002: Finished deploy [analytics/refinery@af75327]: Analytics deploy - druid pageviews_daily [analytics/refinery@af753272] (duration: 03m 53s)
  • 07:54 kharlan@deploy2002: Started scap sync-world: Backport for ConfirmEdit/hCaptcha: Implement automatic failover (T404204)
  • 07:53 kharlan@deploy2002: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /usr/local/bin/update-mediawiki-tools-release' returned non-zero exit status 1. (scap version: 4.213.0) (duration: 00m 00s)
  • 07:53 joal@deploy2002: Started deploy [analytics/refinery@af75327]: Analytics deploy - druid pageviews_daily [analytics/refinery@af753272]
  • 07:52 joal@deploy2002: Finished deploy [analytics/refinery@af75327] (hadoop-test): Analytics deploy - druid pageviews_daily - TEST [analytics/refinery@af753272] (duration: 00m 54s)
  • 07:51 joal@deploy2002: Started deploy [analytics/refinery@af75327] (hadoop-test): Analytics deploy - druid pageviews_daily - TEST [analytics/refinery@af753272]
  • 07:49 marostegui@cumin1003: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83699 and previous config saved to /var/cache/conftool/dbconfig/20251009-074914-root.json
  • 07:47 kharlan@deploy2002: Finished scap sync-world: Backport for EventStreamConfig: Fix user-agent exclusion config (T387600), EventStreamConfig: fix IP auto reveal stream (duration: 11m 53s)
  • 07:43 kharlan@deploy2002: kharlan, bearloga: Continuing with sync
  • 07:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1018.eqiad.wmnet with OS bullseye
  • 07:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2147 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83698 and previous config saved to /var/cache/conftool/dbconfig/20251009-074055-marostegui.json
  • 07:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 07:40 kharlan@deploy2002: kharlan, bearloga: Backport for EventStreamConfig: Fix user-agent exclusion config (T387600), EventStreamConfig: fix IP auto reveal stream synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:35 kharlan@deploy2002: Started scap sync-world: Backport for EventStreamConfig: Fix user-agent exclusion config (T387600), EventStreamConfig: fix IP auto reveal stream
  • 07:31 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1034.eqiad.wmnet onto es1057.eqiad.wmnet
  • 07:29 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Provide capabilities for failing over to alternate CAPTCHA type (T404204) (duration: 11m 54s)
  • 07:25 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:24 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1029.eqiad.wmnet onto es1052.eqiad.wmnet
  • 07:22 kharlan@deploy2002: kharlan: Backport for hCaptcha: Provide capabilities for failing over to alternate CAPTCHA type (T404204) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:20 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1020.eqiad.wmnet -> wdqs1019.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 07:20 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 07:17 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Provide capabilities for failing over to alternate CAPTCHA type (T404204)
  • 07:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1029,1034].eqiad.wmnet with reason: Cloning
  • 07:14 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1034 and es1029 T406488', diff saved to https://phabricator.wikimedia.org/P83697 and previous config saved to /var/cache/conftool/dbconfig/20251009-071430-marostegui.json
  • 07:05 moritzm: installing Redis security updates
  • 06:53 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
  • 06:50 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1019.eqiad.wmnet with OS bullseye
  • 06:48 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
  • 06:39 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
  • 06:31 marostegui@cumin1003: dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83696 and previous config saved to /var/cache/conftool/dbconfig/20251009-063106-root.json
  • 06:28 ryankemper@cumin2002: conftool action : set/pooled=no:weight=10; selector: name=wdqs1019.*
  • 06:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1020.eqiad.wmnet -> wdqs1019.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 06:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 06:27 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 06:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer wikidata_main from wdqs1017.eqiad.wmnet -> wdqs1018.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 06:26 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host (duration: 00m 13s)
  • 06:26 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host
  • 06:26 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host (duration: 00m 14s)
  • 06:26 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host
  • 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1027.eqiad.wmnet onto es1050.eqiad.wmnet
  • 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1027 gradually with 4 steps - Pool es1027.eqiad.wmnet in after cloning
  • 06:16 marostegui@cumin1003: dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83694 and previous config saved to /var/cache/conftool/dbconfig/20251009-061600-root.json
  • 06:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1030.eqiad.wmnet onto es1053.eqiad.wmnet
  • 06:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1030 gradually with 4 steps - Pool es1030.eqiad.wmnet in after cloning
  • 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83691 and previous config saved to /var/cache/conftool/dbconfig/20251009-060054-root.json
  • 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'db2155 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83688 and previous config saved to /var/cache/conftool/dbconfig/20251009-054548-root.json
  • 05:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1050 and es1053 depooled T406488', diff saved to https://phabricator.wikimedia.org/P83687 and previous config saved to /var/cache/conftool/dbconfig/20251009-054347-marostegui.json
  • 05:37 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2155 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83686 and previous config saved to /var/cache/conftool/dbconfig/20251009-053730-marostegui.json
  • 05:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 05:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1027 gradually with 4 steps - Pool es1027.eqiad.wmnet in after cloning
  • 05:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1030 gradually with 4 steps - Pool es1030.eqiad.wmnet in after cloning
  • 04:13 eileen: civicrm upgraded from 6f24d513 to 132211d5
  • 02:11 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release 20251008
  • 02:02 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release 20251008
  • 01:54 mutante: [wdqs1020:~] $ sudo systemctl restart wdqs-blazegraph
  • 01:32 eileen: civicrm upgraded from 4c13f904 to 6f24d513
  • 01:18 eileen: civicrm upgraded from 2c6fedc8 to 4c13f904
  • 01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 14m 20s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-08

  • 23:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1019.eqiad.wmnet with reason: host reimage
  • 23:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
  • 23:50 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1019.eqiad.wmnet with reason: host reimage
  • 23:50 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
  • 22:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 21:25 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1019.eqiad.wmnet with OS bullseye
  • 21:19 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 21:19 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 21:18 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer to freshly reimaged host) xfer scholarly_articles from wdqs2016.codfw.wmnet -> wdqs2017.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 21:13 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh internal-scholarly host T405978 (duration: 00m 12s)
  • 21:13 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh internal-scholarly host T405978
  • 21:10 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
  • 20:36 tgr_: UTC late deploys done
  • 20:35 tgr@deploy2002: Finished scap sync-world: Backport for Deploy JWT session cookies to group2 (T399631) (duration: 13m 53s)
  • 20:31 tgr@deploy2002: tgr: Continuing with sync
  • 20:26 tgr@deploy2002: tgr: Backport for Deploy JWT session cookies to group2 (T399631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:21 tgr@deploy2002: Started scap sync-world: Backport for Deploy JWT session cookies to group2 (T399631)
  • 20:19 tgr@deploy2002: Finished scap sync-world: Backport for eswiki, commonswiki: lift IP cap for workshop (T406655), Launch VisualEditor EditCheck paste check a/b test to 22 wikis (T405422) (duration: 13m 03s)
  • 20:15 tgr@deploy2002: tgr, kemayo, anzx: Continuing with sync
  • 20:11 tgr@deploy2002: tgr, kemayo, anzx: Backport for eswiki, commonswiki: lift IP cap for workshop (T406655), Launch VisualEditor EditCheck paste check a/b test to 22 wikis (T405422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:06 tgr@deploy2002: Started scap sync-world: Backport for eswiki, commonswiki: lift IP cap for workshop (T406655), Launch VisualEditor EditCheck paste check a/b test to 22 wikis (T405422)
  • 20:02 hashar: Disabled Gerrit Apache mod_qos by putting it to be logging only # T406774
  • 19:30 krinkle@deploy2002: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on remaining Wikipedias except enwiki (T403510), Disable wmgUseMdotRouting on enwiki (T403510) (duration: 09m 26s)
  • 19:25 krinkle@deploy2002: krinkle: Continuing with sync
  • 19:25 krinkle@deploy2002: krinkle: Backport for Disable wmgUseMdotRouting on remaining Wikipedias except enwiki (T403510), Disable wmgUseMdotRouting on enwiki (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:20 krinkle@deploy2002: Started scap sync-world: Backport for Disable wmgUseMdotRouting on remaining Wikipedias except enwiki (T403510), Disable wmgUseMdotRouting on enwiki (T403510)
  • 19:10 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1001.wikimedia.org with OS bookworm
  • 18:58 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS bookworm
  • 18:56 ssastry@deploy2002: Finished scap sync-world: Backport for Revert "Add a DOM version of the TOC markers pass" (duration: 16m 00s)
  • 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 18:50 ssastry@deploy2002: ssastry: Continuing with sync
  • 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 18:46 ssastry@deploy2002: ssastry: Backport for Revert "Add a DOM version of the TOC markers pass" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:43 hashar: For posterity: October 8th 2025. The day brett and Krinkle are getting rid of the last .m. subdomain.
  • 18:40 ssastry@deploy2002: Started scap sync-world: Backport for Revert "Add a DOM version of the TOC markers pass"
  • 18:36 brett: Enable unified mobile routing on en.wikipedia.org rollout complete - T403510
  • 18:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1001.wikimedia.org with OS bookworm
  • 18:33 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release vX.Y.Z - cmooney@cumin1003
  • 18:32 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1001.wikimedia.org with OS bookworm
  • 18:31 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release vX.Y.Z - cmooney@cumin1003
  • 18:27 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on wdqs2017.codfw.wmnet with reason: finish getting host ready for production
  • 18:04 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 17:59 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 17:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 17:54 swfrench-wmf: completed post-switchover right-sizing of large mediawiki services - T405955
  • 17:51 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:51 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:51 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:50 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:49 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:49 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:49 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 17:45 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:45 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:44 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:44 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:42 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:42 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 17:42 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS bookworm
  • 17:39 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:39 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:34 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1001.wikimedia.org with OS bookworm
  • 17:33 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:32 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:32 brett: Enable unified mobile routing on en.wikipedia.org - T403510
  • 17:26 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:26 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:22 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2052 gradually with 4 steps - Pooling in new host
  • 17:20 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:11 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:10 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:10 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 16:53 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2002.codfw.wmnet with reason: WIP
  • 16:43 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2001.codfw.wmnet with reason: WIP
  • 16:43 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
  • 16:42 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1002.eqiad.wmnet with reason: WIP
  • 16:37 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2052 gradually with 4 steps - Pooling in new host
  • 16:36 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2052.codfw.wmnet
  • 16:36 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for es2052.codfw.wmnet
  • 16:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2017.codfw.wmnet with OS bullseye
  • 16:26 fceratto@cumin1002: dbctl commit (dc=all): 'Add es2052 T402859', diff saved to https://phabricator.wikimedia.org/P83675 and previous config saved to /var/cache/conftool/dbconfig/20251008-162623-fceratto.json
  • 16:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2017.codfw.wmnet with reason: host reimage
  • 16:10 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2017.codfw.wmnet with reason: host reimage
  • 15:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
  • 15:51 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-launcher1003.eqiad.wmnet with OS bullseye
  • 15:37 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 15:37 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 15:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:33 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:33 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:18 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-launcher1003.eqiad.wmnet with reason: host reimage
  • 15:16 elukey: reboot ms-be1088 as a test for T404356
  • 15:14 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be1088.eqiad.wmnet with reason: testing
  • 15:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-launcher1003.eqiad.wmnet with reason: host reimage
  • 15:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
  • 15:11 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2088.codfw.wmnet with reason: testing
  • 15:05 Lucas_WMDE: UTC afternoon backport+config window do ne
  • 15:03 derick@deploy2002: Finished scap sync-world: Backport for SharedDomainHookHandler: Remove WebAuthn sitenotice, SharedDomainHookHandler: Remove WebAuthn sitenotice (duration: 42m 36s)
  • 14:59 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-launcher1003.eqiad.wmnet with OS bullseye
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 14:50 derick@deploy2002: d3r1ck01, derick: Continuing with sync
  • 14:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 14:47 derick@deploy2002: d3r1ck01, derick: Backport for SharedDomainHookHandler: Remove WebAuthn sitenotice, SharedDomainHookHandler: Remove WebAuthn sitenotice synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:34 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
  • 14:33 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
  • 14:29 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 14:20 derick@deploy2002: Started scap sync-world: Backport for SharedDomainHookHandler: Remove WebAuthn sitenotice, SharedDomainHookHandler: Remove WebAuthn sitenotice
  • 14:12 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Temporarily undeploy JWT session cookies (T399631), jwt: Use core cookie settings (T406621), jwt: Use core cookie settings (T406621), Force OATHManage to be on central domain (T401773), Force OATHManage to be on central domain (T401773) (duration: 14m 0
  • 14:09 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site esams [reason: cr1-esams is back online and working after card re-seat, T406705]
  • 14:09 cmooney@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site esams [reason: cr1-esams is back online and working after card re-seat, T406705]
  • 14:08 topranks: re-pool esams in dns after cr1-esams restored to normal operation T406705
  • 14:07 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:06 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:06 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:03 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde, reedy, tgr: Continuing with sync
  • {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde, reedy, tgr: Backport for Temporarily undeploy JWT session cookies (T399631), jwt: Use core cookie settings (T406621), jwt: Use core cookie settings (T406621), Force OATHManage to be on central domain (T401773), [[gerrit:1194150|Force OATHManage to be on central domain (T401773)}}
  • 13:56 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Temporarily undeploy JWT session cookies (T399631), jwt: Use core cookie settings (T406621), jwt: Use core cookie settings (T406621), Force OATHManage to be on central domain (T401773), Force OATHManage to be on central domain (T401773)
  • 13:54 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Disable mobilefrontend on donatewiki (T406638) (duration: 44m 23s)
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
  • 13:42 lucaswerkmeister-wmde@deploy2002: pcoombe, lucaswerkmeister-wmde: Continuing with sync
  • 13:39 lucaswerkmeister-wmde@deploy2002: pcoombe, lucaswerkmeister-wmde: Backport for Disable mobilefrontend on donatewiki (T406638) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 13:28 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 13:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2001.codfw.wmnet
  • 13:19 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2001.codfw.wmnet
  • 13:14 jgleeson: civicrm upgraded from 9db8f0d5 to 2c6fedc8
  • 13:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
  • 13:10 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Disable mobilefrontend on donatewiki (T406638)
  • 13:10 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2002.codfw.wmnet
  • 13:03 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2002.codfw.wmnet
  • 12:49 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test2005.wikimedia.org
  • 12:49 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test2005.wikimedia.org with OS trixie
  • 12:45 derick@deploy2002: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=fywiki --logwiki=metawiki Constable31 Shogeneral # T406731
  • 12:33 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test2005.wikimedia.org with reason: host reimage
  • 12:28 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test2005.wikimedia.org with reason: host reimage
  • 12:25 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 12:25 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 12:24 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 12:24 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 12:22 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 12:22 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2078.codfw.wmnet with OS bullseye
  • 12:22 elukey@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
  • 12:15 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ms-be[2083-2084].codfw.wmnet with reason: awaiting controller swap
  • 12:10 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
  • 12:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2005.wikimedia.org - slyngshede@cumin1003"
  • 12:10 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2005.wikimedia.org - slyngshede@cumin1003"
  • 12:09 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test2005.wikimedia.org on all recursors
  • 12:09 slyngshede@cumin1003: START - Cookbook sre.dns.wipe-cache idp-test2005.wikimedia.org on all recursors
  • 12:09 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:09 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2005.wikimedia.org - slyngshede@cumin1003"
  • 12:09 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2005.wikimedia.org - slyngshede@cumin1003"
  • 12:08 btullis@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on P{dse-k8s-worker2002.codfw.wmnet} and (A:dse-k8s-master-codfw or A:dse-k8s-worker-codfw)
  • 12:07 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{dse-k8s-worker2002.codfw.wmnet} and (A:dse-k8s-master-codfw or A:dse-k8s-worker-codfw)
  • 12:05 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 12:05 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp-test2005.wikimedia.org
  • 12:05 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test2005.wikimedia.org
  • 12:05 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:05 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2005.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1003"
  • 12:05 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2005.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1003"
  • 12:04 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 12:01 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 11:59 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 11:57 slyngshede@cumin1003: START - Cookbook sre.hosts.decommission for hosts idp-test2005.wikimedia.org
  • 11:50 slyngshede@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host idp-test2005.wikimedia.org
  • 11:50 slyngshede@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:47 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 11:47 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp-test2005.wikimedia.org
  • 11:43 slyngshede@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host idp-test2005.wikimedia.org
  • 11:43 slyngshede@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:42 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2078
  • 11:42 mvernon@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2078
  • 11:40 mvernon@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2078
  • 11:40 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2078.codfw.wmnet 239.32.192.10.in-addr.arpa 9.3.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 11:40 mvernon@cumin1002: START - Cookbook sre.dns.wipe-cache ms-be2078.codfw.wmnet 239.32.192.10.in-addr.arpa 9.3.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 11:40 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:40 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2078 - mvernon@cumin1002"
  • 11:39 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 11:39 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp-test2005.wikimedia.org
  • 11:34 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2078 - mvernon@cumin1002"
  • 11:34 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS bookworm
  • 11:30 btullis@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker-codfw
  • 11:28 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker-codfw
  • 11:28 mvernon@cumin1002: START - Cookbook sre.dns.netbox
  • 11:26 claime: Enabling puppet on cp nodes - 1193903: gateway-check: Group-based routing approach | https://gerrit.wikimedia.org/r/c/operations/puppet/+/1193903 - T406318
  • 11:25 mvernon@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:22 mvernon@cumin1002: START - Cookbook sre.dns.netbox
  • 11:22 mvernon@cumin1002: START - Cookbook sre.hosts.move-vlan for host ms-be2078
  • 11:22 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS bullseye
  • 11:19 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2078.codfw.wmnet with OS trixie
  • 11:09 moritzm: imported megacli into thirdparty/hwraid (upstream repo doesn't cover trixie yet, copied over from bookworm) T391083
  • 10:53 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS bookworm
  • 10:43 claime: Disabling puppet on cp nodes - 1193903: gateway-check: Group-based routing approach | https://gerrit.wikimedia.org/r/c/operations/puppet/+/1193903 - T406318
  • 10:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 10:34 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:34 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 10:33 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 10:33 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:31 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 10:30 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 10:29 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 10:22 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 10:20 jmm@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 10:17 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS trixie
  • 10:16 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 10:15 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS trixie
  • 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 10:02 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 09:47 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 09:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2027.codfw.wmnet onto es2052.codfw.wmnet
  • 09:37 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2027 gradually with 4 steps - Pool es2027.codfw.wmnet in after cloning
  • 09:36 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
  • 09:24 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 09:24 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 09:08 topranks: disable BGP to asw*-esams from cr1-esams as the CR external links are also down
  • 09:02 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site esams [reason: no reason specified, ]
  • 09:02 Emperor: depool esams
  • 09:02 mvernon@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site esams [reason: no reason specified, ]
  • 08:52 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2027 gradually with 4 steps - Pool es2027.codfw.wmnet in after cloning
  • 08:50 marostegui@cumin1003: dbctl commit (dc=all): 'db2172 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83669 and previous config saved to /var/cache/conftool/dbconfig/20251008-085005-root.json
  • 08:44 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:35 marostegui@cumin1003: dbctl commit (dc=all): 'db2172 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83667 and previous config saved to /var/cache/conftool/dbconfig/20251008-083459-root.json
  • 08:33 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:31 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2057.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2057.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:19 marostegui@cumin1003: dbctl commit (dc=all): 'db2172 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83666 and previous config saved to /var/cache/conftool/dbconfig/20251008-081953-root.json
  • 08:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.22 refs T405678
  • 08:04 marostegui@cumin1003: dbctl commit (dc=all): 'db2172 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83665 and previous config saved to /var/cache/conftool/dbconfig/20251008-080448-root.json
  • 08:03 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS trixie
  • 08:02 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:00 moritzm: installing libxml2 security updates
  • 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2172 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83664 and previous config saved to /var/cache/conftool/dbconfig/20251008-075612-marostegui.json
  • 07:56 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 07:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:47 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:46 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:44 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:37 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:27 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 07:22 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
  • 07:21 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1030.eqiad.wmnet onto es1053.eqiad.wmnet
  • 07:17 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 07:16 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1030 T406488', diff saved to https://phabricator.wikimedia.org/P83663 and previous config saved to /var/cache/conftool/dbconfig/20251008-071656-marostegui.json
  • 07:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:15 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 06:57 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1018.eqiad.wmnet with OS bullseye
  • 06:55 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS trixie
  • 06:53 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
  • 06:31 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1027.eqiad.wmnet onto es1050.eqiad.wmnet
  • 06:29 moritzm: installing openssl security updates
  • 06:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1027 T406488', diff saved to https://phabricator.wikimedia.org/P83662 and previous config saved to /var/cache/conftool/dbconfig/20251008-062752-marostegui.json
  • 06:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1027,1030].eqiad.wmnet with reason: Cloning
  • 06:24 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1026.eqiad.wmnet onto es1049.eqiad.wmnet
  • 06:24 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1026 gradually with 4 steps - Pool es1026.eqiad.wmnet in after cloning
  • 06:24 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es1028.eqiad.wmnet onto es1051.eqiad.wmnet
  • 06:24 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1028 gradually with 4 steps - Pool es1028.eqiad.wmnet in after cloning
  • 06:24 marostegui@cumin1003: dbctl commit (dc=all): 'Add es1049 and es1051 to dbctl depooled T406488', diff saved to https://phabricator.wikimedia.org/P83659 and previous config saved to /var/cache/conftool/dbconfig/20251008-062404-marostegui.json
  • 06:12 moritzm: rebalance Ganeti eqiad/D following vmscape reboots
  • 05:37 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1026 gradually with 4 steps - Pool es1026.eqiad.wmnet in after cloning
  • 05:37 marostegui@cumin1003: START - Cookbook sre.mysql.pool es1028 gradually with 4 steps - Pool es1028.eqiad.wmnet in after cloning
  • 04:41 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
  • 04:37 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host T405978 (duration: 00m 14s)
  • 04:37 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-main host T405978
  • 03:55 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978 (duration: 02m 01s)
  • 03:53 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978
  • 03:53 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978 (duration: 16m 11s)
  • 03:52 ryankemper@cumin2002: conftool action : set/pooled=no:weight=10; selector: name=wdqs1018.*
  • 03:41 ryankemper@cumin2002: conftool action : GET; selector: name=wdqs1018.eqiad.wmnet
  • 03:38 ryankemper@cumin2002: conftool action : set/pooled=no:weight=10; selector: name=wdqs1018.*
  • 03:37 ryankemper@deploy2002: Started deploy [wdqs/wdqs@fea7794]: deploy to fresh wdqs-internal-main host T405978
  • 02:33 eileen: civicrm upgraded from 8228670e to 9db8f0d5
  • 02:27 eileen: civicrm upgraded from 7a81fe1c to 8228670e
  • 02:19 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 02:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1018.eqiad.wmnet with OS bullseye
  • 02:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 02:09 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 02:05 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 01:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
  • 01:50 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 13s)
  • 01:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 00:27 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 00:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 00:09 sbassett: Deployed security mitigation for T406664 to 1.45.0-wmf.22
  • 00:00 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply

2025-10-07

  • 23:58 sbassett: Deployed security mitigation for T406664 to 1.45.0-wmf.21
  • 23:58 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:57 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:57 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:57 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:54 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:53 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:53 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:52 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:50 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:47 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:45 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:18 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 23:13 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 22:46 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 22:35 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 22:12 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "running per cookbook error suggestion - bking@cumin2002 - T399778"
  • 22:11 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "running per cookbook error suggestion - bking@cumin2002 - T399778"
  • 22:04 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
  • 22:02 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1020\.eqiad\.wmnet
  • 21:50 bking@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: T405978 (duration: 00m 45s)
  • 21:49 bking@deploy2002: Started deploy [wdqs/wdqs@fea7794]: T405978
  • 21:48 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on wdqs1020.eqiad.wmnet with reason: finish getting host ready for production
  • 21:41 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T405978, transfer main graph to newly-reimaged host) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1020.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 21:41 tgr_: UTC late deploys done
  • {{safesubst:SAL entry|1=21:40 tgr@deploy2002: Finished scap sync-world: Backport for session: Log actual class name in preventSessionsForUser exception (T406566), session: Log actual class name in preventSessionsForUser exception (T406566), session: Log cache write flags in `SessionStore::set()` (T405633 T405634), [[gerrit:1194282|session: Log cache write flags in `SessionStore::set()` (T405}}
  • 21:36 tgr@deploy2002: tgr: Continuing with sync
  • 21:34 tgr@deploy2002: tgr: Backport for session: Log actual class name in preventSessionsForUser exception (T406566), session: Log actual class name in preventSessionsForUser exception (T406566), session: Log cache write flags in `SessionStore::set()` (T405633 T405634), session: Log cache write flags in `SessionStore::set()` (T405633 T405634) synced
  • {{safesubst:SAL entry|1=21:30 tgr@deploy2002: Started scap sync-world: Backport for session: Log actual class name in preventSessionsForUser exception (T406566), session: Log actual class name in preventSessionsForUser exception (T406566), session: Log cache write flags in `SessionStore::set()` (T405633 T405634), [[gerrit:1194282|session: Log cache write flags in `SessionStore::set()` (T4056}}
  • 21:28 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:16 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 21:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 20:58 aaron@deploy2002: Finished scap sync-world: Backport for Add restbase spec JSON files to which /rest_v1/?spec can be routed (T397203 T396805) (duration: 10m 13s)
  • 20:54 aaron@deploy2002: aaron: Continuing with sync
  • 20:53 aaron@deploy2002: aaron: Backport for Add restbase spec JSON files to which /rest_v1/?spec can be routed (T397203 T396805) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 20:50 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 20:48 aaron@deploy2002: Started scap sync-world: Backport for Add restbase spec JSON files to which /rest_v1/?spec can be routed (T397203 T396805)
  • 20:48 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer main graph to newly-reimaged host) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1020.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 20:45 kharlan@deploy2002: Finished scap sync-world: Backport for CheckUser/UserInfoCard: Remove enable-by-default mode for dewiki (T405342) (duration: 11m 05s)
  • 20:41 brett: Enable unified mobile routing on all except en.wikipedia.org - T403510
  • 20:41 kharlan@deploy2002: kharlan: Continuing with sync
  • 20:38 kharlan@deploy2002: kharlan: Backport for CheckUser/UserInfoCard: Remove enable-by-default mode for dewiki (T405342) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:34 kharlan@deploy2002: Started scap sync-world: Backport for CheckUser/UserInfoCard: Remove enable-by-default mode for dewiki (T405342)
  • 20:13 mstyles@deploy2002: Finished scap sync-world: Backport for OATHAuth: Increase 2FA opt-in to 40% of users (T399664) (duration: 09m 08s)
  • 20:09 mstyles@deploy2002: mstyles: Continuing with sync
  • 20:08 mstyles@deploy2002: mstyles: Backport for OATHAuth: Increase 2FA opt-in to 40% of users (T399664) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:05 ejegg: fundraising civicrm upgraded from eac2de65 to 7a81fe1c
  • 20:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1020.eqiad.wmnet with OS bullseye
  • 20:04 mstyles@deploy2002: Started scap sync-world: Backport for OATHAuth: Increase 2FA opt-in to 40% of users (T399664)
  • 19:47 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1020.eqiad.wmnet with reason: host reimage
  • 19:44 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1020.eqiad.wmnet with reason: host reimage
  • 19:01 ejegg: standalone SmashPig upgraded from 86bde4e4 to 32dc5c72
  • 18:09 sukhe@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha1002.wikimedia.org
  • 18:08 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1002.wikimedia.org with OS trixie
  • 17:53 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1002.wikimedia.org with reason: host reimage
  • 17:47 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1002.wikimedia.org with reason: host reimage
  • 17:34 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1002.wikimedia.org with OS trixie
  • 17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 17:34 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha1002.wikimedia.org on all recursors
  • 17:34 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha1002.wikimedia.org on all recursors
  • 17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:34 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 17:32 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 17:29 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 17:29 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1002.wikimedia.org
  • 17:26 taavi: taavi@apt1002 ~ $ sudo -i reprepro -C thirdparty/tofu update trixie-wikimedia # T405742
  • 17:05 mutante: releases2003 - re-enabling puppet - reacting to monitoring alert - T405352
  • 16:30 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:26 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
  • 16:25 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1020.eqiad.wmnet with OS bullseye
  • 16:15 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 16:13 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha1002.wikimedia.org
  • 16:13 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:13 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1003"
  • 16:13 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1003"
  • 16:11 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2044.codfw.wmnet with OS bullseye
  • 16:09 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 16:05 sukhe@cumin1003: START - Cookbook sre.hosts.decommission for hosts hcaptcha1002.wikimedia.org
  • 16:04 sukhe@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host hcaptcha1002.wikimedia.org
  • 16:03 sukhe@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:59 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 15:59 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1002.wikimedia.org
  • 15:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:58 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2044.codfw.wmnet with OS bullseye
  • 15:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker2003.codfw.wmnet with reason: host reimage
  • 15:55 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:52 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2044.codfw.wmnet with OS bookworm
  • 15:52 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 15:49 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker2003.codfw.wmnet with reason: host reimage
  • 15:49 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 15:47 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1001.eqiad.wmnet
  • 15:47 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:47 jasmine@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1003"
  • 15:47 jasmine@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1003"
  • 15:45 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:42 sukhe@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host hcaptcha1002.wikimedia.org
  • 15:42 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host hcaptcha1002.wikimedia.org with OS trixie
  • 15:40 jasmine@cumin1003: START - Cookbook sre.dns.netbox
  • 15:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 15:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:29 jasmine@cumin1003: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1001.eqiad.wmnet
  • 15:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
  • 15:26 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:24 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:24 hashar@deploy2002: Finished deploy [gerrit/gerrit@d0c47da]: Disable component rather than motd plugin (duration: 00m 11s)
  • 15:23 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
  • 15:23 hashar@deploy2002: Started deploy [gerrit/gerrit@d0c47da]: Disable component rather than motd plugin
  • 15:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:21 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:20 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
  • 15:11 jasmine_: homer ‘cr*eqiad’ commit "T383227"
  • 15:09 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2044.codfw.wmnet with OS bookworm
  • 15:09 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 15:03 hashar@deploy2002: Finished deploy [gerrit/gerrit@21d2848]: Disable motd banner: maintenance window has closed - T387833 (duration: 00m 30s)
  • 15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@f2d2c87]: deploy phab1004 for T406597 (duration: 00m 52s)
  • 15:03 hashar@deploy2002: Started deploy [gerrit/gerrit@21d2848]: Disable motd banner: maintenance window has closed - T387833
  • 15:02 brennen@deploy2002: Started deploy [phabricator/deployment@f2d2c87]: deploy phab1004 for T406597
  • 15:02 brennen@deploy2002: Finished deploy [phabricator/deployment@f2d2c87]: deploy phab2002 for T406597 (duration: 00m 31s)
  • 15:01 brennen@deploy2002: Started deploy [phabricator/deployment@f2d2c87]: deploy phab2002 for T406597
  • 15:01 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 14:59 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on phab2002.codfw.wmnet,phab[1004-1005].eqiad.wmnet with reason: T406597
  • 14:58 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
  • 14:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bookworm
  • 14:53 jasmine@deploy2002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1001.eqiad.wmnet
  • 14:51 jasmine@dns1004: END - running authdns-update
  • 14:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
  • 14:50 jasmine@dns1004: START - running authdns-update
  • 14:42 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 14:41 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
  • 14:31 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 14:29 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
  • 14:22 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:22 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 14:22 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 14:21 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1002.wikimedia.org with OS trixie
  • 14:21 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 14:21 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 14:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:21 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 14:21 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 14:17 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
  • 14:16 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 14:16 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 14:11 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
  • 14:11 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 14:11 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
  • 14:04 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:04 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Fix calls to incrementStatsKey() (T406569), Fix calls to incrementStatsKey() (T406569) (duration: 09m 58s)
  • 14:01 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1020.eqiad.wmnet']
  • 14:00 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha1002.wikimedia.org on all recursors
  • 14:00 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha1002.wikimedia.org on all recursors
  • 14:00 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:00 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 14:00 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1002.wikimedia.org - sukhe@cumin1003"
  • 13:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 13:58 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2006.codfw.wmnet with reason: host reimage
  • 13:58 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Fix calls to incrementStatsKey() (T406569), Fix calls to incrementStatsKey() (T406569) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:56 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 13:56 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1002.wikimedia.org
  • 13:56 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 13:55 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:55 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 13:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 13:54 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Fix calls to incrementStatsKey() (T406569), Fix calls to incrementStatsKey() (T406569)
  • 13:52 jhancock@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2006.codfw.wmnet with reason: host reimage
  • 13:51 sukhe@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha1001.wikimedia.org
  • 13:51 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha1001.wikimedia.org with OS trixie
  • 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 13:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1020.eqiad.wmnet with OS bullseye
  • 13:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
  • 13:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 13:43 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
  • 13:41 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 13:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 13:36 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 13:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:35 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 13:35 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 13:34 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 13:34 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 13:28 moritzm: rebalance Ganeti codfw/D following vmscape reboots
  • 13:27 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha1001.wikimedia.org with reason: host reimage
  • 13:17 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:17 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha1001.wikimedia.org with OS trixie
  • 13:17 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:16 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
  • 13:16 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
  • 13:14 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha1001.wikimedia.org on all recursors
  • 13:14 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha1001.wikimedia.org on all recursors
  • 13:14 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:14 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
  • 13:14 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
  • 13:13 esanders@deploy2002: Finished scap sync-world: Backport for Invalidate Flow cache on enwiktionary (T405080) (duration: 10m 07s)
  • 13:10 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 13:10 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1001.wikimedia.org
  • 13:09 esanders@deploy2002: esanders: Continuing with sync
  • 13:08 esanders@deploy2002: esanders: Backport for Invalidate Flow cache on enwiktionary (T405080) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:06 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:06 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:05 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 13:03 esanders@deploy2002: Started scap sync-world: Backport for Invalidate Flow cache on enwiktionary (T405080)
  • 12:25 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83649 and previous config saved to /var/cache/conftool/dbconfig/20251007-122526-root.json
  • 12:23 moritzm: rebalance Ganeti eqiad/C following vmscape reboots
  • 12:15 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2005.wikimedia.org with OS trixie
  • 12:10 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83647 and previous config saved to /var/cache/conftool/dbconfig/20251007-121020-root.json
  • 11:55 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83646 and previous config saved to /var/cache/conftool/dbconfig/20251007-115513-root.json
  • 11:50 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:50 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:49 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:49 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:48 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:48 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:47 marostegui@cumin1003: dbctl commit (dc=all): 'db2206 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83645 and previous config saved to /var/cache/conftool/dbconfig/20251007-114716-root.json
  • 11:40 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83644 and previous config saved to /var/cache/conftool/dbconfig/20251007-114007-root.json
  • 11:33 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test2005.wikimedia.org with OS trixie
  • 11:32 marostegui@cumin1003: dbctl commit (dc=all): 'db2206 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83643 and previous config saved to /var/cache/conftool/dbconfig/20251007-113210-root.json
  • 11:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2005.wikimedia.org
  • 11:27 slyngshede@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2005.wikimedia.org
  • 11:26 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2005.wikimedia.org
  • 11:25 marostegui@cumin1003: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After upgrade to 10.11.14', diff saved to https://phabricator.wikimedia.org/P83642 and previous config saved to /var/cache/conftool/dbconfig/20251007-112501-root.json
  • 11:23 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:23 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:23 slyngshede@cumin1003: START - Cookbook sre.hosts.reboot-single for host idp-test2005.wikimedia.org
  • 11:19 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:18 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:17 marostegui@cumin1003: dbctl commit (dc=all): 'db2206 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83640 and previous config saved to /var/cache/conftool/dbconfig/20251007-111704-root.json
  • 11:16 marostegui: Upgrade db1169 (s1) to 10.11.14 T406543
  • 11:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1169.eqiad.wmnet with reason: Upgrading
  • 11:14 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1169 T406543', diff saved to https://phabricator.wikimedia.org/P83639 and previous config saved to /var/cache/conftool/dbconfig/20251007-111438-marostegui.json
  • 11:13 moritzm: imported cas 7.1.6.2 for trixie-wikimedia T406455
  • 11:12 moritzm: imported prometheus-jmx-exporter 0.15.0 for trixie-wikimedia T406455
  • 11:08 moritzm: rebalance Ganeti codfw/C following vmscape reboots
  • 11:07 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:07 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:04 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:04 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:01 marostegui@cumin1003: dbctl commit (dc=all): 'db2206 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83638 and previous config saved to /var/cache/conftool/dbconfig/20251007-110158-root.json
  • 10:53 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2206 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83637 and previous config saved to /var/cache/conftool/dbconfig/20251007-105337-marostegui.json
  • 10:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 10:44 slyngshede@dns1004: END - running authdns-update
  • 10:43 slyngshede@dns1004: START - running authdns-update
  • 10:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names - cmooney@cumin1003"
  • 10:38 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names - cmooney@cumin1003"
  • 10:31 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:25 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:24 ladsgroup@deploy2002: Finished scap sync-world: Backport for mainstash: Disable multiPrimaryMode (T389893) (duration: 14m 51s)
  • 10:20 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 10:19 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:14 ladsgroup@deploy2002: ladsgroup: Backport for mainstash: Disable multiPrimaryMode (T389893) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:09 ladsgroup@deploy2002: Started scap sync-world: Backport for mainstash: Disable multiPrimaryMode (T389893)
  • 10:04 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f7-eqiad
  • 10:04 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f7-eqiad
  • 10:04 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f6-eqiad
  • 10:04 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f6-eqiad
  • 10:03 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e7-eqiad
  • 10:03 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e7-eqiad
  • 10:03 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f5-eqiad
  • 10:03 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f5-eqiad
  • 10:03 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e6-eqiad
  • 10:03 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e6-eqiad
  • 10:03 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e5-eqiad
  • 10:03 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e5-eqiad
  • 10:02 ladsgroup@deploy2002: Finished scap sync-world: Backport for Undeploy FlaggedRevs from lawikisource (T406424) (duration: 09m 34s)
  • 10:00 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1028.eqiad.wmnet onto es1051.eqiad.wmnet
  • 09:59 aqu@deploy2002: Finished deploy [analytics/refinery@21fe78f] (thin): Regular analytics weekly train THIN [analytics/refinery@21fe78fb] (duration: 01m 05s)
  • 09:58 aqu@deploy2002: Started deploy [analytics/refinery@21fe78f] (thin): Regular analytics weekly train THIN [analytics/refinery@21fe78fb]
  • 09:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 09:57 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2027 - Depool es2027.codfw.wmnet to then clone it to es2052.codfw.wmnet - fceratto@cumin1002
  • 09:57 ladsgroup@deploy2002: ladsgroup: Backport for Undeploy FlaggedRevs from lawikisource (T406424) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:56 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2027 - Depool es2027.codfw.wmnet to then clone it to es2052.codfw.wmnet - fceratto@cumin1002
  • 09:56 fceratto@cumin1002: START - Cookbook sre.mysql.clone_es of es2027.codfw.wmnet onto es2052.codfw.wmnet
  • 09:55 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1028,1051].eqiad.wmnet with reason: Cloning
  • 09:55 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2029.codfw.wmnet
  • 09:55 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for es2029.codfw.wmnet
  • 09:54 aqu@deploy2002: Finished deploy [analytics/refinery@21fe78f]: Regular analytics weekly train [analytics/refinery@21fe78fb] (duration: 42m 33s)
  • 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1028 to clone es1051 T406488', diff saved to https://phabricator.wikimedia.org/P83635 and previous config saved to /var/cache/conftool/dbconfig/20251007-095339-marostegui.json
  • 09:52 ladsgroup@deploy2002: Started scap sync-world: Backport for Undeploy FlaggedRevs from lawikisource (T406424)
  • 09:52 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 09:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2052.codfw.wmnet with reason: Setting up new ES host
  • 09:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on es2029.codfw.wmnet with reason: Setting up new ES host
  • 09:46 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on es2052.codfw.wmnet with reason: Setting up new ES host
  • 09:33 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on es2052.codfw.wmnet with reason: Setting up new ES host
  • 09:33 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1050.eqiad.wmnet with OS bookworm
  • 09:26 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - marostegui@cumin1003"
  • 09:25 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - marostegui@cumin1003"
  • 09:22 marostegui@cumin1003: START - Cookbook sre.mysql.clone_es of es1026.eqiad.wmnet onto es1049.eqiad.wmnet
  • 09:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:19 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 09:19 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1026,1049].eqiad.wmnet with reason: Cloning
  • 09:19 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 09:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:17 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 09:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1026,1049].eqiad.wmnet with reason: Cloning
  • 09:12 aqu@deploy2002: Started deploy [analytics/refinery@21fe78f]: Regular analytics weekly train [analytics/refinery@21fe78fb]
  • 09:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repool es1029 and depool es1026 to clone es1049 T406488', diff saved to https://phabricator.wikimedia.org/P83634 and previous config saved to /var/cache/conftool/dbconfig/20251007-091011-marostegui.json
  • 09:08 marostegui@cumin1003: dbctl commit (dc=all): 'Depool es1029 to clone es1049 T406488', diff saved to https://phabricator.wikimedia.org/P83633 and previous config saved to /var/cache/conftool/dbconfig/20251007-090826-marostegui.json
  • 09:07 aqu@deploy2002: Finished deploy [analytics/refinery@21fe78f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@21fe78fb] (duration: 01m 12s)
  • 09:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1050.eqiad.wmnet with reason: host reimage
  • 09:06 aqu@deploy2002: Started deploy [analytics/refinery@21fe78f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@21fe78fb]
  • 09:05 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 09:04 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 09:04 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:02 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1050.eqiad.wmnet with reason: host reimage
  • 08:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 08:58 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 08:57 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 08:53 marostegui@cumin1003: dbctl commit (dc=all): 'db2210 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83631 and previous config saved to /var/cache/conftool/dbconfig/20251007-085320-root.json
  • 08:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 08:45 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:45 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
  • 08:42 topranks: tighten up acl for ssh access on pfw1-codfw T390939
  • 08:41 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
  • 08:38 marostegui@cumin1003: dbctl commit (dc=all): 'db2210 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83630 and previous config saved to /var/cache/conftool/dbconfig/20251007-083814-root.json
  • 08:37 hashar: Stopped Gerrit on gerrit2003, deleted /srv/gerrit/git/* and restarted a full replication due to bad files ownership # T387833
  • 08:37 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:27 elukey@cumin1003: START - Cookbook sre.hosts.provision for host es1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:23 marostegui@cumin1003: dbctl commit (dc=all): 'db2210 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83629 and previous config saved to /var/cache/conftool/dbconfig/20251007-082309-root.json
  • 08:20 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1050.eqiad.wmnet with OS bookworm
  • 08:17 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.22 refs T405678
  • 08:08 marostegui@cumin1003: dbctl commit (dc=all): 'db2210 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83628 and previous config saved to /var/cache/conftool/dbconfig/20251007-080803-root.json
  • 08:06 moritzm: installing libsndfile security updates
  • 08:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2210 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83627 and previous config saved to /var/cache/conftool/dbconfig/20251007-080015-marostegui.json
  • 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 07:43 marostegui@cumin1003: dbctl commit (dc=all): 'db2219 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83626 and previous config saved to /var/cache/conftool/dbconfig/20251007-074342-root.json
  • 07:34 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
  • 07:33 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1050.eqiad.wmnet with OS bookworm
  • 07:28 marostegui@cumin1003: dbctl commit (dc=all): 'db2219 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83625 and previous config saved to /var/cache/conftool/dbconfig/20251007-072837-root.json
  • 07:21 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: stop copying ores weighted_tags (T389053), cirrus: test completion with default sort on simplewiki [2/3] (T404858) (duration: 15m 32s)
  • 07:14 dcausse@deploy2002: dcausse: Continuing with sync
  • 07:13 marostegui@cumin1003: dbctl commit (dc=all): 'db2219 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83624 and previous config saved to /var/cache/conftool/dbconfig/20251007-071331-root.json
  • 07:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
  • 07:11 dcausse@deploy2002: dcausse: Backport for cirrus: stop copying ores weighted_tags (T389053), cirrus: test completion with default sort on simplewiki [2/3] (T404858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:10 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1050.eqiad.wmnet with OS bookworm
  • 07:05 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: stop copying ores weighted_tags (T389053), cirrus: test completion with default sort on simplewiki [2/3] (T404858)
  • 06:58 marostegui@cumin1003: dbctl commit (dc=all): 'db2219 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83623 and previous config saved to /var/cache/conftool/dbconfig/20251007-065825-root.json
  • 06:50 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2219 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83622 and previous config saved to /var/cache/conftool/dbconfig/20251007-065019-marostegui.json
  • 06:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 06:44 kart_: Updated cxserver to 2025-10-06-084053-production (T394982, T403574)
  • 06:42 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:42 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:41 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 06:40 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:40 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
  • 06:35 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1050.eqiad.wmnet with OS bookworm
  • 06:30 marostegui@cumin1003: dbctl commit (dc=all): 'db2237 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P83621 and previous config saved to /var/cache/conftool/dbconfig/20251007-063014-root.json
  • 06:24 moritzm: rebalance Ganeti eqiad/B following vmscape reboots
  • 06:24 moritzm: rebalance Ganeti codfw/B following vmscape reboots
  • 06:15 marostegui@cumin1003: dbctl commit (dc=all): 'db2237 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P83620 and previous config saved to /var/cache/conftool/dbconfig/20251007-061509-root.json
  • 06:07 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:06 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'db2237 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P83619 and previous config saved to /var/cache/conftool/dbconfig/20251007-060003-root.json
  • 05:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
  • 05:44 marostegui@cumin1003: dbctl commit (dc=all): 'db2237 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P83618 and previous config saved to /var/cache/conftool/dbconfig/20251007-054457-root.json
  • 05:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 05:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2237 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P83617 and previous config saved to /var/cache/conftool/dbconfig/20251007-053628-root.json
  • 05:36 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 05:03 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
  • 05:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1018.eqiad.wmnet with OS bullseye
  • 04:02 mwpresync@deploy2002: Pruned MediaWiki: 1.45.0-wmf.19 (duration: 02m 32s)
  • 03:48 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.45.0-wmf.22 refs T405678 (duration: 45m 18s)
  • 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.45.0-wmf.22 refs T405678
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 28s)
  • 01:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 00:27 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2005-dev.codfw.wmnet with OS trixie

2025-10-06

  • 23:35 jdlrobson@deploy2002: Finished scap sync-world: Backport for tempUserBanner: Set `relative` position to enable `z-index` (T404122) (duration: 11m 30s)
  • 23:30 jdlrobson@deploy2002: jdlrobson: Continuing with sync
  • 23:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 23:28 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 23:28 jdlrobson@deploy2002: jdlrobson: Backport for tempUserBanner: Set `relative` position to enable `z-index` (T404122) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:23 jdlrobson@deploy2002: Started scap sync-world: Backport for tempUserBanner: Set `relative` position to enable `z-index` (T404122)
  • 23:13 jdlrobson@deploy2002: Finished scap sync-world: Backport for Remove old, unused ArticleSummaries Stream (T406361) (duration: 09m 47s)
  • 23:08 jdlrobson@deploy2002: jdlrobson, lmora: Continuing with sync
  • 23:07 jdlrobson@deploy2002: jdlrobson, lmora: Backport for Remove old, unused ArticleSummaries Stream (T406361) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:03 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 23:03 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2017.codfw.wmnet with OS bullseye
  • 23:03 jdlrobson@deploy2002: Started scap sync-world: Backport for Remove old, unused ArticleSummaries Stream (T406361)
  • 22:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
  • 22:48 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1020.eqiad.wmnet with OS bullseye
  • 22:43 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:42 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 22:23 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 22:23 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:22 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:59 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 21:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
  • 21:43 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
  • 21:43 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
  • 21:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 2.8.16 upgrade ()
  • 21:37 eileen: config revision changed from 65339a1a to 02eee6ac
  • 21:35 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - 2.8.16 upgrade ()
  • 21:32 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
  • 21:29 sbassett: Deployed security mitigation for T251032
  • 21:28 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 21:25 eileen: civicrm upgraded from 17092e23 to eac2de65
  • 21:25 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:24 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:14 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:11 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - 2.8.16 upgrade ()
  • 20:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 2.8.16 upgrade ()
  • 20:51 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 20:40 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc2001.codfw.wmnet with OS bookworm
  • 20:40 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 20:39 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 20:35 dani@deploy2002: Finished scap sync-world: Backport for Undeploy reader foundational survey on enwiki (T405410), Increase coverage of Design Research participant recruitment survey on jawiki (T405577) (duration: 09m 37s)
  • 20:31 dani@deploy2002: dani: Continuing with sync
  • 20:30 dani@deploy2002: dani: Backport for Undeploy reader foundational survey on enwiki (T405410), Increase coverage of Design Research participant recruitment survey on jawiki (T405577) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:26 dani@deploy2002: Started scap sync-world: Backport for Undeploy reader foundational survey on enwiki (T405410), Increase coverage of Design Research participant recruitment survey on jawiki (T405577)
  • 20:24 arlolra@deploy2002: Finished scap sync-world: Backport for Deploy Parsoid Read Views to 26 Wikipedias (T406250) (duration: 10m 43s)
  • 20:20 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
  • 20:19 arlolra@deploy2002: arlolra: Continuing with sync
  • 20:19 arlolra@deploy2002: arlolra: Backport for Deploy Parsoid Read Views to 26 Wikipedias (T406250) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:16 jhancock@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
  • 20:13 arlolra@deploy2002: Started scap sync-world: Backport for Deploy Parsoid Read Views to 26 Wikipedias (T406250)
  • 20:10 samtar@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Add page-visited (T401575) (duration: 14m 13s)
  • 20:04 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
  • 20:04 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:04 samtar@deploy2002: samtar: Continuing with sync
  • 20:04 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:02 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp - 2.8.16 upgrade ()
  • 20:01 samtar@deploy2002: samtar: Backport for ext.wikimediaEvents.WatchlistBaseline: Add page-visited (T401575) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:58 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp - 2.8.16 upgrade ()
  • 19:58 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy: Introduce output DSL rendering for known_client objects - swfrench@cumin2002"
  • 19:58 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Introduce output DSL rendering for known_client objects - swfrench@cumin2002
  • 19:56 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Introduce output DSL rendering for known_client objects - swfrench@cumin2002
  • 19:56 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy: Introduce output DSL rendering for known_client objects - swfrench@cumin2002"
  • 19:56 samtar@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Add page-visited (T401575)
  • 19:49 btullis@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{dse-k8s-worker[1004-1019].eqiad.wmnet} and (A:dse-k8s-master-eqiad or A:dse-k8s-worker-eqiad)
  • 19:45 musikanimal@deploy2002: Finished scap sync-world: Backport for WishRenderer: short-circuit and show error if proposer is invalid (T406194) (duration: 39m 00s)
  • 19:33 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 19:32 musikanimal@deploy2002: musikanimal: Backport for WishRenderer: short-circuit and show error if proposer is invalid (T406194) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:13 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 19:10 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 19:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp - 2.8.16 upgrade ()
  • 19:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp - 2.8.16 upgrade ()
  • 19:07 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 19:07 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 19:06 musikanimal@deploy2002: Started scap sync-world: Backport for WishRenderer: short-circuit and show error if proposer is invalid (T406194)
  • 18:53 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS trixie
  • 18:40 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp - 2.8.16 upgrade ()
  • 18:36 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp - 2.8.16 upgrade ()
  • 18:02 ejegg: fundraising python tools upgraded from 3fba9888 to 698309f1
  • 17:59 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2051 gradually with 4 steps - Pooling in new host
  • 17:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp - 2.8.16 upgrade ()
  • 17:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp - 2.8.16 upgrade ()
  • 17:42 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-misc2001.codfw.wmnet with OS bookworm
  • 17:42 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:31 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:29 jasmine@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: decom
  • 17:19 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams and A:cp - 2.8.16 upgrade ()
  • 17:17 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams and A:cp - 2.8.16 upgrade ()
  • 17:13 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2051 gradually with 4 steps - Pooling in new host
  • 17:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2051 - Depooling host
  • 17:12 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2051 - Depooling host
  • 16:46 otto@deploy2002: Finished deploy [analytics/refinery@21fe78f]: deploying analytics/refinery to an-launcher1002 to pick up change for T389666 (duration: 02m 11s)
  • 16:44 otto@deploy2002: Started deploy [analytics/refinery@21fe78f]: deploying analytics/refinery to an-launcher1002 to pick up change for T389666
  • 16:40 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2051 gradually with 4 steps - Pooling in new host
  • 16:32 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams and A:cp - 2.8.16 upgrade ()
  • 16:30 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams and A:cp - 2.8.16 upgrade ()
  • 16:22 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
  • 16:17 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:06 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:06 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:05 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:55 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 59s)
  • 15:55 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2051 gradually with 4 steps - Pooling in new host
  • 15:53 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 59s)
  • 15:46 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test1005.wikimedia.org
  • 15:46 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1005.wikimedia.org with OS trixie
  • 15:39 fceratto@cumin1002: dbctl commit (dc=all): 'Add es2051 T402859', diff saved to https://phabricator.wikimedia.org/P83607 and previous config saved to /var/cache/conftool/dbconfig/20251006-153927-fceratto.json
  • 15:32 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1005.wikimedia.org with reason: host reimage
  • 15:27 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1005.wikimedia.org with reason: host reimage
  • 15:24 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:19 sukhe@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha2002.wikimedia.org
  • 15:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha2002.wikimedia.org with OS trixie
  • 15:18 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw and A:cp - 2.8.16 upgrade ()
  • 15:18 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:14 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp - 2.8.16 upgrade ()
  • 15:08 moritzm: installing libxslt security updates
  • 15:06 moritzm: installing libcpanel-json-xs-perl security updates
  • 15:03 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha2002.wikimedia.org with reason: host reimage
  • 14:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha2002.wikimedia.org with reason: host reimage
  • 14:58 sukhe@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=93) for new host hcaptcha1001.wikimedia.org
  • 14:58 sukhe@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:57 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:56 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host idp-test1005.wikimedia.org with OS trixie
  • 14:55 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1005.wikimedia.org - slyngshede@cumin1003"
  • 14:55 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1005.wikimedia.org - slyngshede@cumin1003"
  • 14:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 14:51 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.failover (exit_code=99) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 14:50 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 14:42 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha2002.wikimedia.org with OS trixie
  • 14:42 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:40 marostegui@cumin1003: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:40 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha2002.wikimedia.org - sukhe@cumin1003"
  • 14:40 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha2002.wikimedia.org - sukhe@cumin1003"
  • 14:40 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha2002.wikimedia.org on all recursors
  • 14:40 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha2002.wikimedia.org on all recursors
  • 14:40 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:39 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha2002.wikimedia.org - sukhe@cumin1003"
  • 14:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbproxy1028.eqiad.wmnet with reason: Maintenance
  • 14:39 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha2002.wikimedia.org - sukhe@cumin1003"
  • 14:38 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2058.codfw.wmnet']
  • 14:37 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058.codfw.wmnet']
  • 14:37 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2057.codfw.wmnet']
  • 14:37 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2057.codfw.wmnet']
  • 14:37 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 14:37 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 14:36 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2055.codfw.wmnet']
  • 14:36 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 14:36 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha2002.wikimedia.org
  • 14:34 sukhe@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha2001.wikimedia.org
  • 14:34 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha2001.wikimedia.org with OS trixie
  • 14:34 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw and A:cp - 2.8.16 upgrade ()
  • 14:34 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp - 2.8.16 upgrade ()
  • 14:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha2001.wikimedia.org with reason: host reimage
  • 14:19 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:17 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: namespaceDupes diqwiki --fix # T328207
  • 14:15 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Change Portal talk namespace name for diqwiki (T328207), UserInfoCard: Limit who can view past blocks and remove redundant data points (T406480) (duration: 11m 31s)
  • 14:13 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha2001.wikimedia.org with reason: host reimage
  • 14:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, kharlan, cappybaraa: Continuing with sync
  • 14:06 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp - 2.8.16 upgrade ()
  • 14:06 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, kharlan, cappybaraa: Backport for Change Portal talk namespace name for diqwiki (T328207), UserInfoCard: Limit who can view past blocks and remove redundant data points (T406480) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:04 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp - 2.8.16 upgrade ()
  • 14:04 elukey@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 14:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Change Portal talk namespace name for diqwiki (T328207), UserInfoCard: Limit who can view past blocks and remove redundant data points (T406480)
  • 13:58 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:58 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:58 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:58 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:57 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host hcaptcha2001.wikimedia.org with OS trixie
  • 13:53 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha2001.wikimedia.org - sukhe@cumin1003"
  • 13:52 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha2001.wikimedia.org - sukhe@cumin1003"
  • 13:52 cdanis@deploy2002: Finished scap sync-world: Backport for EventStreamConfig - Enable hive ingestion for eventgate-logging-external based streams (T304373) (duration: 12m 24s)
  • 13:52 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2055.codfw.wmnet']
  • 13:52 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha2001.wikimedia.org on all recursors
  • 13:52 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha2001.wikimedia.org on all recursors
  • 13:52 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:52 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha2001.wikimedia.org - sukhe@cumin1003"
  • 13:52 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 13:52 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha2001.wikimedia.org - sukhe@cumin1003"
  • 13:51 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 13:51 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 13:51 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 13:48 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 13:48 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha1001.wikimedia.org on all recursors
  • 13:48 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache hcaptcha1001.wikimedia.org on all recursors
  • 13:48 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:47 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
  • 13:47 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha1001.wikimedia.org - sukhe@cumin1003"
  • 13:46 cdanis@deploy2002: cdanis, otto: Continuing with sync
  • 13:46 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha2001.wikimedia.org
  • 13:46 cdanis@deploy2002: cdanis, otto: Backport for EventStreamConfig - Enable hive ingestion for eventgate-logging-external based streams (T304373) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:44 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 13:44 sukhe@cumin1003: START - Cookbook sre.ganeti.makevm for new host hcaptcha1001.wikimedia.org
  • 13:39 cdanis@deploy2002: Started scap sync-world: Backport for EventStreamConfig - Enable hive ingestion for eventgate-logging-external based streams (T304373)
  • 13:37 bwojtowicz@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:34 bwojtowicz@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:29 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{dse-k8s-worker[1004-1019].eqiad.wmnet} and (A:dse-k8s-master-eqiad or A:dse-k8s-worker-eqiad)
  • 13:24 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp - 2.8.16 upgrade ()
  • 13:24 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp - 2.8.16 upgrade ()
  • 13:19 mfossati@deploy2002: Finished scap sync-world: Backport for ReaderExperiments' ImageBrowsing: use edge uniques (T403259) (duration: 11m 32s)
  • 13:15 mfossati@deploy2002: mfossati: Continuing with sync
  • 13:15 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2052.codfw.wmnet']
  • 13:14 mfossati@deploy2002: mfossati: Backport for ReaderExperiments' ImageBrowsing: use edge uniques (T403259) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:14 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 13:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 13:13 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 13:12 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 13:11 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 13:11 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 13:11 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 13:11 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 13:08 mfossati@deploy2002: Started scap sync-world: Backport for ReaderExperiments' ImageBrowsing: use edge uniques (T403259)
  • 12:55 hashar: Restarting Zuul. Deadlocked due to zombie connections with Gerrit
  • 12:48 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt sretest1005 - jclark@cumin1002"
  • 12:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt sretest1005 - jclark@cumin1002"
  • 12:44 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:43 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest1005
  • 12:43 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host sretest1005
  • 12:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:41 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt sretest1005 - jclark@cumin1002"
  • 12:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt sretest1005 - jclark@cumin1002"
  • 12:39 arnaudb@dns1004: END - running authdns-update
  • 12:38 arnaudb@dns1004: START - running authdns-update
  • 12:37 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:29 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.failover (exit_code=99) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:29 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
  • 12:28 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
  • 12:28 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
  • 12:28 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
  • 12:28 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:27 arnaudb@cumin1003: END (ERROR) - Cookbook sre.gerrit.failover (exit_code=97) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:25 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
  • 12:25 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
  • 12:25 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
  • 12:25 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
  • 12:25 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:22 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:22 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:20 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.failover (exit_code=99) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:08 moritzm: upgrade Envoy on yarn/turnilo hosts T403663
  • 12:07 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
  • 12:07 hashar: stopped CI Jenkins
  • 12:07 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
  • 12:05 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
  • 12:05 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
  • 12:05 arnaudb@dns1004: START - running authdns-update
  • 12:04 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
  • 12:04 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:04 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 11:25 Amir1: dropping interwiki table on group2 (T397367)
  • 11:22 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru and not P{cp7008.magru.wmnet} and A:cp - 2.8.16 upgrade ()
  • 11:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru and not P{cp7016.magru.wmnet} and A:cp - 2.8.16 upgrade ()
  • 11:17 Amir1: dropping interwiki table on group1 (T397367)
  • 11:15 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2047.codfw.wmnet']
  • 10:54 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2047.codfw.wmnet']
  • 10:54 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 10:54 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2046.codfw.wmnet']
  • 10:54 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2045.codfw.wmnet']
  • 10:53 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2045.codfw.wmnet']
  • 10:53 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 214657
  • 10:52 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 214657
  • 10:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 10:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 10:42 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 10:41 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 10:41 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2044.codfw.wmnet']
  • 10:41 elukey: upgraded spicerack to 11.10.0 on all cumin nodes
  • 10:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test1005.wikimedia.org on all recursors
  • 10:40 slyngshede@cumin1003: START - Cookbook sre.dns.wipe-cache idp-test1005.wikimedia.org on all recursors
  • 10:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1005.wikimedia.org - slyngshede@cumin1003"
  • 10:40 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1005.wikimedia.org - slyngshede@cumin1003"
  • 10:39 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2043.codfw.wmnet']
  • 10:39 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru and not P{cp7016.magru.wmnet} and A:cp - 2.8.16 upgrade ()
  • 10:39 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2043.codfw.wmnet']
  • 10:39 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru and not P{cp7008.magru.wmnet} and A:cp - 2.8.16 upgrade ()
  • 10:39 vgutierrez: upgrading to haproxy 2.8.16 on magru - T406451
  • 10:36 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 10:36 slyngshede@cumin1003: START - Cookbook sre.ganeti.makevm for new host idp-test1005.wikimedia.org
  • 10:33 moritzm: restarting postfix to pick up openssl security updates
  • 10:26 btullis@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker-eqiad
  • 10:12 moritzm: restarting spamsasssin/clamav on VRTS to pick up OpenSSL updates
  • 10:12 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp[7008,7016].magru.wmnet} and A:cp - 2.8.16 upgrade ()
  • 10:00 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp[7008,7016].magru.wmnet} and A:cp - 2.8.16 upgrade ()
  • 10:00 vgutierrez: upgrade to haproxy 2.8.16 on cp7008 and cp7016 - T406451
  • 09:55 vgutierrez: fetch haproxy 2.8.16 on thirdparty/haproxy28-bullseye (apt.wm.o) - T406451
  • 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
  • 09:33 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
  • 09:26 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
  • 09:23 moritzm: upgrade Envoy on schema* T403663
  • 09:18 elukey: uploaded spicerack_11.10.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
  • 08:56 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker-eqiad
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 08:40 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 08:09 moritzm: installing OpenSSL security updates on trixie/bookworm
  • 08:07 dcausse: closing the UTC morning backport window
  • 08:06 dcausse@deploy2002: Finished scap sync-world: Backport for Allow AbuseFilter to block on ganwiki (T406220), cirrus: test completion with default sort on simplewiki [1/3] (T404858) (duration: 12m 48s)
  • 08:01 dcausse@deploy2002: hamishz, dcausse: Continuing with sync
  • 08:00 dcausse@deploy2002: hamishz, dcausse: Backport for Allow AbuseFilter to block on ganwiki (T406220), cirrus: test completion with default sort on simplewiki [1/3] (T404858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:53 dcausse@deploy2002: Started scap sync-world: Backport for Allow AbuseFilter to block on ganwiki (T406220), cirrus: test completion with default sort on simplewiki [1/3] (T404858)
  • 07:49 kharlan@deploy2002: Finished scap sync-world: Backport for MetricsPlatformAuthPreserveQueryParamsExperiments: Define hCaptcha A/B test (T405239) (duration: 11m 42s)
  • 07:44 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:43 kharlan@deploy2002: kharlan: Backport for MetricsPlatformAuthPreserveQueryParamsExperiments: Define hCaptcha A/B test (T405239) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:37 kharlan@deploy2002: Started scap sync-world: Backport for MetricsPlatformAuthPreserveQueryParamsExperiments: Define hCaptcha A/B test (T405239)
  • 07:34 kharlan@deploy2002: Finished scap sync-world: Backport for Implement AuthPreserveQueryParams for Metrics Platform mpo param (T404622), UserInfoCard: Hide new articles count when likely to be inaccurate (T399096) (duration: 14m 04s)
  • 07:32 moritzm: rebalance Ganeti codfw/A following vmscape reboots
  • 07:30 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:26 kharlan@deploy2002: kharlan: Backport for Implement AuthPreserveQueryParams for Metrics Platform mpo param (T404622), UserInfoCard: Hide new articles count when likely to be inaccurate (T399096) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:20 kharlan@deploy2002: Started scap sync-world: Backport for Implement AuthPreserveQueryParams for Metrics Platform mpo param (T404622), UserInfoCard: Hide new articles count when likely to be inaccurate (T399096)
  • 07:02 kharlan@deploy2002: Finished scap sync-world: Backport for UserInfoCard: Hide reverted edit count if user has more than 1,000 edits (T401466) (duration: 42m 35s)
  • 07:00 moritzm: rebalance Ganeti eqiad/A following vmscape reboots
  • 06:49 kharlan@deploy2002: kharlan: Continuing with sync
  • 06:47 kharlan@deploy2002: kharlan: Backport for UserInfoCard: Hide reverted edit count if user has more than 1,000 edits (T401466) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:19 kharlan@deploy2002: Started scap sync-world: Backport for UserInfoCard: Hide reverted edit count if user has more than 1,000 edits (T401466)
  • 06:12 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Upgrade with minor comsmetic tweaks - oblivian@cumin1003"
  • 06:12 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Upgrade with minor comsmetic tweaks - oblivian@cumin1003
  • 06:11 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Upgrade with minor comsmetic tweaks - oblivian@cumin1003
  • 06:11 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Upgrade with minor comsmetic tweaks - oblivian@cumin1003"
  • 05:43 marostegui@dns1006: END - running authdns-update
  • 05:41 marostegui@dns1006: START - running authdns-update
  • 04:49 eileen: civicrm upgraded from ff529ecf to 17092e23
  • 01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 14m 31s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-05

  • 23:50 eileen: civicrm upgraded from 7c31a25c to ff529ecf
  • 23:19 eileen: config revision changed from 0d78c876 to 276d34f0
  • 01:02 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 01m 24s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-04

  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 44s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-10-03

  • 19:37 mutante: LDAP added user btracy to group wmf T405366
  • 19:07 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1002.eqiad.wmnet with reason: WIP
  • 19:07 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
  • 18:50 ejegg: payments-wiki upgraded from e8ef5539 to 4b8293df
  • 18:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 18:00 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:56 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:56 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:33 jasmine@dns1004: END - running authdns-update
  • 17:31 jasmine@dns1004: START - running authdns-update
  • 17:30 jasmine@dns1004: START - running authdns-update
  • 17:27 jasmine@dns1004: START - running authdns-update
  • 17:11 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2002.codfw.wmnet with reason: WIP
  • 17:09 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:08 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1002.eqiad.wmnet with reason: WIP
  • 17:03 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2001.codfw.wmnet with reason: WIP
  • 17:02 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
  • 16:59 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 16:57 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 16:47 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 16:17 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2058.codfw.wmnet']
  • 15:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2058.codfw.wmnet']
  • 15:44 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2057.codfw.wmnet']
  • 15:38 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy: Fix pending form field preservation on validation failure - swfrench@cumin2002"
  • 15:38 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Fix pending form field preservation on validation failure - swfrench@cumin2002
  • 15:37 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Fix pending form field preservation on validation failure - swfrench@cumin2002
  • 15:37 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy: Fix pending form field preservation on validation failure - swfrench@cumin2002"
  • 15:27 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2057.codfw.wmnet']
  • 13:37 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:36 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:18 stevemunene@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:16 stevemunene@cumin1003: START - Cookbook sre.dns.netbox
  • 13:11 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:08 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:08 stevemunene@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:07 stevemunene@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:02 logmsgbot: reedy Deployed security patch for T406322
  • 12:34 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:23 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:16 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:16 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:12 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:11 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:11 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:11 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 11:57 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:15 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:27 topranks: reset PIC 1/0 on cr2-eqiad to configure port 5 speed T402588
  • 10:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cr[1-2]-eqiad,cr2-eqord,cr1-magru,ssw1-f1-eqiad with reason: reset PIC 0/1 in cr2 to set port 5 speed
  • 10:21 topranks: drain traffic from cr2-codfw <-> ssw1-f1-codfw link to allow for cr2-codfw card reset T402588
  • 10:17 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:15 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:15 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names for cr2-eqiad et-1/0/5.100 interface IPs - cmooney@cumin1003"
  • 10:14 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:14 topranks: drain transport circuits on PIC 1/0 of cr2-eqiad to allow for card reboot T402588
  • 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts druid1008.eqiad.wmnet
  • 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names for cr2-eqiad et-1/0/5.100 interface IPs - cmooney@cumin1003"
  • 10:09 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:02 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:01 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts druid1008.eqiad.wmnet
  • 10:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1007.eqiad.wmnet
  • 10:00 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:00 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: druid1007.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 09:59 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: druid1007.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 09:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2057.codfw.wmnet']
  • 09:56 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:55 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1012.eqiad.wmnet with OS bookworm
  • 09:48 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts druid1007.eqiad.wmnet
  • 09:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2057.codfw.wmnet']
  • 09:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 09:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2056.codfw.wmnet']
  • 09:40 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 09:33 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
  • 09:27 jynus@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
  • 09:21 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 09:11 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 09:07 jynus@cumin1003: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 09:04 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2054.codfw.wmnet']
  • 08:59 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 08:46 stevemunene@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 08:46 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2028.codfw.wmnet onto es2051.codfw.wmnet
  • 08:46 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2028 gradually with 4 steps - Pool es2028.codfw.wmnet in after cloning
  • 08:44 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:44 stevemunene@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 08:43 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:43 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 08:33 brouberol@cumin1003: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 08:31 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 08:29 brouberol@cumin1003: START - Cookbook sre.wdqs.restart
  • 08:25 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2053.codfw.wmnet']
  • 08:25 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 08:24 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 08:21 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 08:05 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 08:00 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2028 gradually with 4 steps - Pool es2028.codfw.wmnet in after cloning
  • 07:51 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 07:44 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2051.codfw.wmnet']
  • 07:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 07:43 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2050.codfw.wmnet']
  • 07:43 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:42 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:40 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:39 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:38 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:38 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:16 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:16 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:12 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:12 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 04:47 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 04:41 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 04:40 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 04:32 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 03:36 tstarling@deploy2002: Finished scap sync-world: Backport for Fallback to first result row if none in baselang is found (T406196), Ensure linkUpdateComplete handler is only run for entities (T406192) (duration: 11m 15s)
  • 03:31 tstarling@deploy2002: tstarling: Continuing with sync
  • 03:30 tstarling@deploy2002: tstarling: Backport for Fallback to first result row if none in baselang is found (T406196), Ensure linkUpdateComplete handler is only run for entities (T406192) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 03:24 tstarling@deploy2002: Started scap sync-world: Backport for Fallback to first result row if none in baselang is found (T406196), Ensure linkUpdateComplete handler is only run for entities (T406192)
  • 01:30 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 14m 12s)
  • 01:03 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 01:02 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 00:57 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 00:56 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 00:49 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 00:49 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 00:44 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 00:43 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 00:38 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 00:32 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1012.eqiad.wmnet with OS bookworm

2025-10-02

  • 23:24 samwilson@deploy2002: Finished scap sync-world: Backport for Fetch wikitext from the translation lang subpage, not the baselang (duration: 16m 07s)
  • 23:20 samwilson@deploy2002: samwilson: Continuing with sync
  • 23:10 samwilson@deploy2002: samwilson: Backport for Fetch wikitext from the translation lang subpage, not the baselang synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:08 samwilson@deploy2002: Started scap sync-world: Backport for Fetch wikitext from the translation lang subpage, not the baselang
  • 22:46 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 22:15 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 21:53 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 21:53 zabe@deploy2002: Finished scap sync-world: Backport for Stop setting CategoryLinksSchemaMigrationStage (T299951) (duration: 12m 37s)
  • 21:47 zabe@deploy2002: zabe: Continuing with sync
  • 21:46 zabe@deploy2002: zabe: Backport for Stop setting CategoryLinksSchemaMigrationStage (T299951) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:41 zabe@deploy2002: Started scap sync-world: Backport for Stop setting CategoryLinksSchemaMigrationStage (T299951)
  • 21:37 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 21:30 ejegg: donorwiki upgraded from dc7cda24 to e8ef5539
  • 21:30 ejegg: payments-wiki upgraded from 2b281477 to e8ef5539
  • 21:27 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 21:27 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:26 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:25 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:25 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:17 samtar@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics (T401575) (duration: 12m 35s)
  • 21:12 samtar@deploy2002: samtar: Continuing with sync
  • 21:10 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:08 samtar@deploy2002: samtar: Backport for ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics (T401575) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:04 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:04 samtar@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents.WatchlistBaseline: Add watchlist baseline metrics (T401575)
  • 21:03 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:58 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:47 ebomani@deploy2002: Finished scap sync-world: Backport for CommonSettings.php: Replace usage of $wgCaptchaWhitelist (T277936) (duration: 13m 17s)
  • 20:45 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 20:42 ebomani@deploy2002: reedy, ebomani: Continuing with sync
  • 20:40 ebomani@deploy2002: reedy, ebomani: Backport for CommonSettings.php: Replace usage of $wgCaptchaWhitelist (T277936) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:40 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 20:33 ebomani@deploy2002: Started scap sync-world: Backport for CommonSettings.php: Replace usage of $wgCaptchaWhitelist (T277936)
  • 20:30 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:30 ebernhardson@deploy2002: Finished scap sync-world: Backport for cirrus: Start AB test of did-you-mean profiles (T390858) (duration: 09m 29s)
  • 20:30 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:26 ebernhardson@deploy2002: ebernhardson: Continuing with sync
  • 20:25 ebernhardson@deploy2002: ebernhardson: Backport for cirrus: Start AB test of did-you-mean profiles (T390858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s4 and s1 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83587 and previous config saved to /var/cache/conftool/dbconfig/20251002-202536-ladsgroup.json
  • 20:23 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:23 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:21 ebernhardson@deploy2002: Started scap sync-world: Backport for cirrus: Start AB test of did-you-mean profiles (T390858)
  • 20:16 dani@deploy2002: Finished scap sync-world: Backport for Deploy reader foundational survey on enwiki (T405410) (duration: 11m 29s)
  • 20:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Harmonize weights in s1 in eqiad', diff saved to https://phabricator.wikimedia.org/P83586 and previous config saved to /var/cache/conftool/dbconfig/20251002-201611-ladsgroup.json
  • 20:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s4 and s1 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83585 and previous config saved to /var/cache/conftool/dbconfig/20251002-201532-ladsgroup.json
  • 20:15 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:12 dani@deploy2002: dani: Continuing with sync
  • 20:11 dani@deploy2002: dani: Backport for Deploy reader foundational survey on enwiki (T405410) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:09 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Harmonize weights in s8 in eqiad', diff saved to https://phabricator.wikimedia.org/P83584 and previous config saved to /var/cache/conftool/dbconfig/20251002-200948-ladsgroup.json
  • 20:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s8 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83583 and previous config saved to /var/cache/conftool/dbconfig/20251002-200621-ladsgroup.json
  • 20:05 dani@deploy2002: Started scap sync-world: Backport for Deploy reader foundational survey on enwiki (T405410)
  • 20:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s8 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83582 and previous config saved to /var/cache/conftool/dbconfig/20251002-200354-ladsgroup.json
  • 20:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s7 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83581 and previous config saved to /var/cache/conftool/dbconfig/20251002-200143-ladsgroup.json
  • 19:59 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes
  • 19:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s7 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83580 and previous config saved to /var/cache/conftool/dbconfig/20251002-195426-ladsgroup.json
  • 19:49 samtar@deploy2002: Finished scap sync-world: Backport for EventStreamConfig and stream registration for watchlist click tracking (T401575) (duration: 10m 46s)
  • 19:44 samtar@deploy2002: samtar: Continuing with sync
  • 19:44 samtar@deploy2002: samtar: Backport for EventStreamConfig and stream registration for watchlist click tracking (T401575) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:38 samtar@deploy2002: Started scap sync-world: Backport for EventStreamConfig and stream registration for watchlist click tracking (T401575)
  • 19:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s5 and s6 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83579 and previous config saved to /var/cache/conftool/dbconfig/20251002-193217-ladsgroup.json
  • 19:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s5 and s6 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83578 and previous config saved to /var/cache/conftool/dbconfig/20251002-192928-ladsgroup.json
  • 19:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s2 in codfw (T405087)', diff saved to https://phabricator.wikimedia.org/P83577 and previous config saved to /var/cache/conftool/dbconfig/20251002-192726-ladsgroup.json
  • 19:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Abolish api group from s2 in eqiad (T405087)', diff saved to https://phabricator.wikimedia.org/P83576 and previous config saved to /var/cache/conftool/dbconfig/20251002-191918-ladsgroup.json
  • 19:14 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes
  • 19:11 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 19:08 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 18:58 ladsgroup@deploy2002: Finished scap sync-world: Backport for db-production: Enable shuffle sharding (T405087) (duration: 22m 32s)
  • 18:53 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 18:41 ladsgroup@deploy2002: ladsgroup: Backport for db-production: Enable shuffle sharding (T405087) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:35 ladsgroup@deploy2002: Started scap sync-world: Backport for db-production: Enable shuffle sharding (T405087)
  • 18:27 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes
  • 17:50 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:44 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:43 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes
  • 17:40 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:40 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:25 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Repool services in Eqiad following DC switchover (T399891) - T399891
  • 17:03 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Repool services in Eqiad following DC switchover (T399891) - T399891
  • 16:42 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: Repool Eqiad following DC switchover (T399891), T399891]
  • 16:42 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: Repool Eqiad following DC switchover (T399891), T399891]
  • 15:52 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:52 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:52 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox
  • 15:51 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:51 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:51 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:51 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:50 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:46 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:46 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:42 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2035.codfw.wmnet
  • 15:42 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2035.codfw.wmnet
  • 15:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:36 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:31 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:31 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:30 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 15:12 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 15:09 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:08 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:58 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names for cr1-eqiad et-1/0/5.100 interface IPs - cmooney@cumin1003"
  • 14:58 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new dns names for cr1-eqiad et-1/0/5.100 interface IPs - cmooney@cumin1003"
  • 14:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 14:36 topranks: reset PIC 0/1 on cr1-eqiad to set port speed for port 5 T402588
  • 14:36 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cr[1-2]-eqiad,ssw1-e1-eqiad with reason: reset PIC 0/1 in cr1-eqiad to set port 5 speed
  • 14:28 topranks: drain link from cr1-eqiad <-> ssw1-e1-eqiad to allow PIC card reboot on cr1-eqiad T402588
  • 14:26 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
  • 14:26 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 14:25 taavi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'k8s.svc.toolsbeta.eqiad1.wikimedia.cloud$' on eqiad recursors
  • 14:25 taavi@cumin1002: START - Cookbook sre.dns.wipe-cache 'k8s.svc.toolsbeta.eqiad1.wikimedia.cloud$' on eqiad recursors
  • 14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
  • 14:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
  • 14:17 topranks: drain transport circuit cr1-eqiad <-> cr1-codfw to allow for PIC card reboot on cr1-eqiad T402588
  • 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1046.eqiad.wmnet
  • 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
  • 14:10 tgr_: UTC afternoon deploys done
  • 14:10 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
  • 14:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
  • 14:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 14:08 tgr@deploy2002: Finished scap sync-world: Backport for Enable JWT session cookies on group1 (T399631) (duration: 17m 41s)
  • 14:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1046.eqiad.wmnet
  • 14:04 tgr@deploy2002: tgr: Continuing with sync
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2046.codfw.wmnet
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2046.codfw.wmnet
  • 13:58 tgr@deploy2002: tgr: Backport for Enable JWT session cookies on group1 (T399631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2046.codfw.wmnet
  • 13:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2046.codfw.wmnet
  • 13:51 tgr@deploy2002: Started scap sync-world: Backport for Enable JWT session cookies on group1 (T399631)
  • 13:47 jforrester@deploy2002: Finished scap sync-world: Backport for Revert^2 "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" (duration: 11m 39s)
  • 13:44 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:44 moritzm: failover Ganeti master in eqiad to ganeti1048
  • 13:43 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 13:42 jforrester@deploy2002: jforrester: Continuing with sync
  • 13:42 jforrester@deploy2002: jforrester: Backport for Revert^2 "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:41 jhancock@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2035']
  • 13:39 jayme@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker2035.codfw.wmnet with reason: Hardware failure
  • 13:35 jforrester@deploy2002: Started scap sync-world: Backport for Revert^2 "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator"
  • 13:34 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for session: Lookup authenticated store first before anon store (T402808), session: Lookup authenticated store first before anon store (T402808) (duration: 12m 56s)
  • 13:29 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde: Continuing with sync
  • 13:27 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde: Backport for session: Lookup authenticated store first before anon store (T402808), session: Lookup authenticated store first before anon store (T402808) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:23 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
  • 13:21 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for session: Lookup authenticated store first before anon store (T402808), session: Lookup authenticated store first before anon store (T402808)
  • 13:17 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox
  • 13:16 dani@deploy2002: Finished scap sync-world: Backport for Update reader foundational survey on enwiki (T405410) (duration: 11m 54s)
  • 13:11 dani@deploy2002: dani: Continuing with sync
  • 13:11 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
  • 13:10 dani@deploy2002: dani: Backport for Update reader foundational survey on enwiki (T405410) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:10 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
  • 13:04 dani@deploy2002: Started scap sync-world: Backport for Update reader foundational survey on enwiki (T405410)
  • 12:57 jhancock@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2035']
  • 12:56 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2046.codfw.wmnet
  • 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2046.codfw.wmnet
  • 12:32 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:31 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
  • 12:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
  • 12:10 moritzm: failover Ganeti master in codfw to ganeti2048
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 12:06 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2028 - Depool es2028.codfw.wmnet to then clone it to es2051.codfw.wmnet - fceratto@cumin1002
  • 12:06 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2028 - Depool es2028.codfw.wmnet to then clone it to es2051.codfw.wmnet - fceratto@cumin1002
  • 12:06 fceratto@cumin1002: START - Cookbook sre.mysql.clone_es of es2028.codfw.wmnet onto es2051.codfw.wmnet
  • 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 11:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
  • 11:45 stevemunene@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on druid[1007-1008].eqiad.wmnet with reason: Decommissioning druid_public hosts
  • 11:40 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:39 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:35 moritzm: failover Ganeti master in drmrs02 to ganeti6002
  • 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
  • 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 11:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
  • 11:21 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:20 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:19 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:19 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:18 moritzm: installing postgresql security updates on netboxdb nodes
  • 11:17 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:14 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti6003.drmrs.wmnet
  • 11:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti6003.drmrs.wmnet
  • 11:12 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir
  • 11:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:04 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 11:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 11:02 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:02 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 10:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:57 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir
  • 10:52 zabe@deploy2002: Finished scap sync-world: Backport for Revert "RevisionStore: Find identical revisions without using rev_sha1" (duration: 11m 06s)
  • 10:48 moritzm: failover Ganeti master in drmrs01 to ganeti6001
  • 10:48 zabe@deploy2002: zabe: Continuing with sync
  • 10:47 zabe@deploy2002: zabe: Backport for Revert "RevisionStore: Find identical revisions without using rev_sha1" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:43 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:41 zabe@deploy2002: Started scap sync-world: Backport for Revert "RevisionStore: Find identical revisions without using rev_sha1"
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 10:33 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 10:15 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 10:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
  • 10:11 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 10:02 moritzm: installing OpenSSL security updates on trixie/bookworm
  • 10:02 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 09:59 moritzm: failover Ganeti master in eqsin to ganeti5007
  • 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
  • 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
  • 09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
  • 09:17 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.21 refs T405677
  • 09:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
  • 09:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
  • 09:01 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on es2051.codfw.wmnet with reason: Setting up new ES host
  • 09:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
  • 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
  • 08:55 awight@deploy2002: Finished scap sync-world: Backport for Revert "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" (T406185 T397401 T401682), UX changes for reference context item (T404690), Nasty fix for main ref change in main+details (T406002) (duration: 48m 54s)
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
  • 08:43 awight@deploy2002: awight, hashar: Continuing with sync
  • 08:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
  • 08:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
  • 08:35 hashar@deploy2002: Finished deploy [gerrit/gerrit@3ef5714]: Add a banner for a Gerrit switch over maintenance - T387833 (duration: 00m 12s)
  • 08:35 hashar@deploy2002: Started deploy [gerrit/gerrit@3ef5714]: Add a banner for a Gerrit switch over maintenance - T387833
  • 08:35 hashar@deploy2002: deploy aborted: Add a banner for a Gerrit switch over maintenance - T387833 (duration: 00m 00s)
  • 08:35 hashar@deploy2002: Started deploy [gerrit/gerrit@3ef5714]: Add a banner for a Gerrit switch over maintenance - T387833
  • 08:34 awight@deploy2002: awight, hashar: Backport for Revert "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" (T406185 T397401 T401682), UX changes for reference context item (T404690), Nasty fix for main ref change in main+details (T406002) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verif
  • 08:16 brouberol@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-druid1007.eqiad.wmnet with reason: Hosts are being decomissioned
  • 08:10 moritzm: failover Ganeti master in ulsfo to ganeti4008
  • 08:06 awight@deploy2002: Started scap sync-world: Backport for Revert "Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator" (T406185 T397401 T401682), UX changes for reference context item (T404690), Nasty fix for main ref change in main+details (T406002)
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 08:05 root@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host wikikube-worker2035.codfw.wmnet
  • 08:02 root@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2035.codfw.wmnet
  • 07:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 07:54 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
  • 07:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 07:45 hashar@deploy2002: Finished scap sync-world: Backport for Add abusefilter-modify-restricted to enwiki EFM (T405999) (duration: 15m 40s)
  • 07:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 07:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
  • 07:41 hashar@deploy2002: eggroll97, hashar: Continuing with sync
  • 07:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
  • 07:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
  • 07:37 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:36 hashar@deploy2002: eggroll97, hashar: Backport for Add abusefilter-modify-restricted to enwiki EFM (T405999) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
  • 07:29 hashar@deploy2002: Started scap sync-world: Backport for Add abusefilter-modify-restricted to enwiki EFM (T405999)
  • 07:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
  • 07:19 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:07 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2049.codfw.wmnet']
  • 07:07 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 07:06 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 06:26 krinkle@deploy2002: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on Commons (T403510), Disable wmgUseMdotRouting on id, fr, de, es, ru, and ja.wikipedia (T403510) (duration: 23m 01s)
  • 06:21 krinkle@deploy2002: krinkle: Continuing with sync
  • 06:09 krinkle@deploy2002: krinkle: Backport for Disable wmgUseMdotRouting on Commons (T403510), Disable wmgUseMdotRouting on id, fr, de, es, ru, and ja.wikipedia (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:03 krinkle@deploy2002: Started scap sync-world: Backport for Disable wmgUseMdotRouting on Commons (T403510), Disable wmgUseMdotRouting on id, fr, de, es, ru, and ja.wikipedia (T403510)
  • 03:43 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes # T402967
  • 02:55 musikanimal@deploy2002: mwscript-k8s job started: extensions/CommunityRequests/maintenance/migrateFromGadget.php --wiki=metawiki --status-csv=wishes-status-migration.csv --wishes # T402967
  • 02:27 musikanimal@deploy2002: Finished scap sync-world: Backport for Enable debug logging for CommunityRequests (T402967) (duration: 13m 47s)
  • 02:22 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 02:20 musikanimal@deploy2002: musikanimal: Backport for Enable debug logging for CommunityRequests (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 02:13 musikanimal@deploy2002: Started scap sync-world: Backport for Enable debug logging for CommunityRequests (T402967)
  • 02:02 musikanimal@deploy2002: Finished scap sync-world: Backport for FocusAreaStore: use virtual DB connection when counting wishes (T402967) (duration: 12m 25s)
  • 01:57 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 01:56 musikanimal@deploy2002: musikanimal: Backport for FocusAreaStore: use virtual DB connection when counting wishes (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:50 musikanimal@deploy2002: Started scap sync-world: Backport for FocusAreaStore: use virtual DB connection when counting wishes (T402967)
  • 01:16 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 38s)
  • 01:02 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 01:02 musikanimal@deploy2002: Finished scap sync-world: Backport for WishStore: don't use virtual domain when querying for actor ID (T402967) (duration: 11m 14s)
  • 00:57 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 00:57 musikanimal@deploy2002: musikanimal: Backport for WishStore: don't use virtual domain when querying for actor ID (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:50 musikanimal@deploy2002: Started scap sync-world: Backport for WishStore: don't use virtual domain when querying for actor ID (T402967)
  • 00:29 musikanimal@deploy2002: Finished scap sync-world: Backport for Increase timeout for MessageIndex lock (T402967) (duration: 13m 30s)
  • 00:22 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 00:22 musikanimal@deploy2002: musikanimal: Backport for Increase timeout for MessageIndex lock (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:15 musikanimal@deploy2002: Started scap sync-world: Backport for Increase timeout for MessageIndex lock (T402967)

2025-10-01

  • 23:16 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2017.codfw.wmnet with OS bullseye
  • 23:14 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1018.eqiad.wmnet with OS bullseye
  • 22:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 22:53 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 22:52 bvibber@deploy2002: Finished scap sync-world: Backport for Add ReaderExperiments extension (T404398), Deploy ReaderExperiments to Beta cluster (T404398), Enable ReaderExperiments on Beta (T404398), Load ReaderExperiments extension in CommonSettings-labs.php (T404398) (duration: 40m 32s)
  • 22:40 bvibber@deploy2002: egardner, bvibber: Continuing with sync
  • 22:39 bvibber@deploy2002: egardner, bvibber: Backport for Add ReaderExperiments extension (T404398), Deploy ReaderExperiments to Beta cluster (T404398), Enable ReaderExperiments on Beta (T404398), Load ReaderExperiments extension in CommonSettings-labs.php (T404398) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes
  • 22:13 TimStarling: migrating wishes to CommunityRequests with migrateFromGadget.php
  • 22:12 bvibber@deploy2002: Started scap sync-world: Backport for Add ReaderExperiments extension (T404398), Deploy ReaderExperiments to Beta cluster (T404398), Enable ReaderExperiments on Beta (T404398), Load ReaderExperiments extension in CommonSettings-labs.php (T404398)
  • 22:08 tstarling@deploy2002: Finished scap sync-world: Backport for Enable CommunityRequests on metawiki (T402967), metawiki: Configure permissions for CommunityRequests (T402967) (duration: 10m 42s)
  • 22:04 tstarling@deploy2002: musikanimal, tstarling: Continuing with sync
  • 22:02 tstarling@deploy2002: musikanimal, tstarling: Backport for Enable CommunityRequests on metawiki (T402967), metawiki: Configure permissions for CommunityRequests (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:57 tstarling@deploy2002: Started scap sync-world: Backport for Enable CommunityRequests on metawiki (T402967), metawiki: Configure permissions for CommunityRequests (T402967)
  • 21:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
  • 21:56 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
  • 21:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:39 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:36 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:35 jforrester@deploy2002: Finished scap sync-world: Backport for Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator (T397401 T401682) (duration: 09m 39s)
  • 21:31 jforrester@deploy2002: jforrester: Continuing with sync
  • 21:30 jforrester@deploy2002: jforrester: Backport for Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator (T397401 T401682) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:25 jforrester@deploy2002: Started scap sync-world: Backport for Enable Wikifunctions client mode on Wiktionaries, Part III, and Incubator (T397401 T401682)
  • 21:18 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:17 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:17 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:16 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:15 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:15 tstarling@deploy2002: Finished scap sync-world: Backport for Configure CommunityRequests virtual domain (T402967) (duration: 07m 36s)
  • 21:15 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:11 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:11 tstarling@deploy2002: tstarling: Continuing with sync
  • 21:10 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:10 tstarling@deploy2002: tstarling: Backport for Configure CommunityRequests virtual domain (T402967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:10 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:09 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:07 tstarling@deploy2002: Started scap sync-world: Backport for Configure CommunityRequests virtual domain (T402967)
  • 21:07 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:06 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:05 arlolra@deploy2002: Finished scap sync-world: Backport for Revert "Add parsoid support in ProofreadPage extension" (duration: 09m 47s)
  • 21:00 arlolra@deploy2002: arlolra: Continuing with sync
  • 20:59 arlolra@deploy2002: arlolra: Backport for Revert "Add parsoid support in ProofreadPage extension" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:55 arlolra@deploy2002: Started scap sync-world: Backport for Revert "Add parsoid support in ProofreadPage extension"
  • 20:51 derick@deploy2002: Finished scap sync-world: Backport for Revert^2 "session: Enable MultiBackendSessionStore on `group1` wikis" (duration: 12m 46s)
  • 20:46 derick@deploy2002: d3r1ck01, derick: Continuing with sync
  • 20:44 derick@deploy2002: d3r1ck01, derick: Backport for Revert^2 "session: Enable MultiBackendSessionStore on `group1` wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:38 derick@deploy2002: Started scap sync-world: Backport for Revert^2 "session: Enable MultiBackendSessionStore on `group1` wikis"
  • 20:34 derick@deploy2002: Finished scap sync-world: Backport for session: Handle an edge-case in MultiBackendSessionStore::set() (T402808), session: Handle an edge-case in MultiBackendSessionStore::set() (T402808) (duration: 12m 57s)
  • 20:30 derick@deploy2002: derick, d3r1ck01: Continuing with sync
  • 20:27 derick@deploy2002: derick, d3r1ck01: Backport for session: Handle an edge-case in MultiBackendSessionStore::set() (T402808), session: Handle an edge-case in MultiBackendSessionStore::set() (T402808) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:21 derick@deploy2002: Started scap sync-world: Backport for session: Handle an edge-case in MultiBackendSessionStore::set() (T402808), session: Handle an edge-case in MultiBackendSessionStore::set() (T402808)
  • 19:49 mutante: cloud
  • 19:13 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Enable A/B test for frwiki (T405239) (duration: 26m 24s)
  • 19:11 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:11 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:10 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:10 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:10 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:10 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:10 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:08 kharlan@deploy2002: kharlan: Continuing with sync
  • 18:53 kharlan@deploy2002: kharlan: Backport for hCaptcha: Enable A/B test for frwiki (T405239) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:46 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Enable A/B test for frwiki (T405239)
  • 18:18 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.21 refs T405677
  • 16:39 swfrench@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 16:34 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 16:33 swfrench@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • {{safesubst:SAL entry|1=16:23 kharlan@deploy2002: Finished scap sync-world: Backport for SimpleCaptcha::canSkipCaptcha: Remove unneeded Config parameter, CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), [[gerrit:1192923|Sim}}
  • 16:19 kharlan@deploy2002: kharlan: Continuing with sync
  • {{safesubst:SAL entry|1=16:17 kharlan@deploy2002: kharlan: Backport for SimpleCaptcha::canSkipCaptcha: Remove unneeded Config parameter, CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), [[gerrit:1192923|SimpleCaptcha::canSk}}
  • 16:15 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • {{safesubst:SAL entry|1=16:10 kharlan@deploy2002: Started scap sync-world: Backport for SimpleCaptcha::canSkipCaptcha: Remove unneeded Config parameter, CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), CreateAccountInstrumentationPreAuthenticationProvider: Don't create event if user can skip CAPTCHA (T405239), [[gerrit:1192923|Simp}}
  • 16:07 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:07 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:57 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:56 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:51 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:51 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:49 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:49 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:35 claime: Finished eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 15:34 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 15:34 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 15:34 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:33 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:33 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 15:33 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 15:33 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 15:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:31 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 15:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 15:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 15:30 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 15:28 cgoubert@deploy2002: Finished scap sync-world: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703 (duration: 03m 16s)
  • 15:27 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 15:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 15:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repool db1259 after maint T401906', diff saved to https://phabricator.wikimedia.org/P83573 and previous config saved to /var/cache/conftool/dbconfig/20251001-152620-ladsgroup.json
  • 15:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 15:25 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 15:25 cgoubert@deploy2002: Started scap sync-world: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 15:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 15:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 15:23 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 15:23 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 15:23 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 15:22 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:22 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
  • 15:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
  • 15:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
  • 15:20 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
  • 15:20 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 15:20 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 15:19 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 15:19 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 15:19 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 15:18 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 15:18 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 15:18 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 15:18 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 15:17 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 15:17 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
  • 15:17 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
  • 15:16 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 15:16 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 15:16 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:16 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:15 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 15:07 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 15:05 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 15:04 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:04 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:49 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:45 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 14:45 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 14:44 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 14:44 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 14:44 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 14:44 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:44 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:44 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:43 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:43 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:41 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:41 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:41 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:41 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:40 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 14:40 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:40 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 14:40 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 14:38 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 14:38 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 14:38 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 14:38 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 14:38 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 14:37 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 14:34 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:33 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 14:33 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 14:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 14:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 14:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 14:31 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 14:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 14:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 14:30 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 14:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 14:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 14:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 14:28 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 14:28 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 14:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 14:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 14:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 14:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:25 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:25 cgoubert@deploy2002: Started scap sync-world: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 14:25 cgoubert@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703 (duration: 201m 05s)
  • 14:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 14:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 14:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 14:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 14:23 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:23 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 14:22 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 14:22 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 14:21 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 14:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 14:20 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 14:20 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:19 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 14:19 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 14:18 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 14:18 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 14:18 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 14:17 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 14:16 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=thumbor.*,name=codfw
  • 14:16 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=swift.*,name=eqiad
  • 14:16 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=thumbor.*,name=eqiad
  • 14:16 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 14:15 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 14:14 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 14:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
  • 14:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 14:13 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
  • 14:12 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 14:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 14:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:11 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:11 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 14:09 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 14:08 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:08 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:08 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 14:06 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:06 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 14:06 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:06 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: apply
  • 14:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T401906)', diff saved to https://phabricator.wikimedia.org/P83572 and previous config saved to /var/cache/conftool/dbconfig/20251001-140538-fceratto.json
  • 14:05 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:05 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 14:04 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: apply
  • 14:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1259 (T401906)', diff saved to https://phabricator.wikimedia.org/P83571 and previous config saved to /var/cache/conftool/dbconfig/20251001-140422-fceratto.json
  • 14:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1259.eqiad.wmnet with reason: Maintenance
  • 14:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T401906)', diff saved to https://phabricator.wikimedia.org/P83570 and previous config saved to /var/cache/conftool/dbconfig/20251001-140400-fceratto.json
  • 14:03 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:02 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 14:01 cgoubert@cumin1003: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=toolhub.*
  • 14:00 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 13:58 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 13:56 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=wdqs2016\.codfw\.wmnet
  • 13:53 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 13:51 jelto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 239 hosts with reason: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 13:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P83569 and previous config saved to /var/cache/conftool/dbconfig/20251001-134852-fceratto.json
  • 13:46 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 13:44 SandraEbele_: Deployed refinery-source using jenkins(weekly deployment train)
  • 13:44 cgoubert@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) pool for host wikikube-ctrl[1001-1004].eqiad.wmnet
  • 13:44 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl[1001-1004].eqiad.wmnet
  • 13:35 cgoubert@cumin1003: END (FAIL) - Cookbook sre.k8s.wipe-cluster (exit_code=99) Wipe the K8s cluster wikikube-eqiad: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 13:35 cgoubert@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:34 cgoubert@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:34 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:33 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:33 cgoubert@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P83568 and previous config saved to /var/cache/conftool/dbconfig/20251001-133344-fceratto.json
  • 13:33 cgoubert@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:33 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:31 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:31 cgoubert@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:30 cgoubert@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 13:30 cgoubert@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:30 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 13:30 cgoubert@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:30 cgoubert@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:30 cgoubert@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:29 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:28 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:28 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:24 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:24 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:24 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:23 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T401906)', diff saved to https://phabricator.wikimedia.org/P83566 and previous config saved to /var/cache/conftool/dbconfig/20251001-131836-fceratto.json
  • 13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T401906)', diff saved to https://phabricator.wikimedia.org/P83565 and previous config saved to /var/cache/conftool/dbconfig/20251001-131719-fceratto.json
  • 13:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1254.eqiad.wmnet with reason: Maintenance
  • 13:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 13:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T401906)', diff saved to https://phabricator.wikimedia.org/P83564 and previous config saved to /var/cache/conftool/dbconfig/20251001-131639-fceratto.json
  • 13:13 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 13:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repool db1172 after upgrade T406008', diff saved to https://phabricator.wikimedia.org/P83563 and previous config saved to /var/cache/conftool/dbconfig/20251001-131033-ladsgroup.json
  • 13:07 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1258* gradually with 4 steps - Work done
  • 13:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P83561 and previous config saved to /var/cache/conftool/dbconfig/20251001-130131-fceratto.json
  • 12:56 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=thumbor.*,name=eqiad
  • 12:53 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=swift.*,name=eqiad
  • 12:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1172 for upgrade T406008', diff saved to https://phabricator.wikimedia.org/P83559 and previous config saved to /var/cache/conftool/dbconfig/20251001-125120-ladsgroup.json
  • 12:50 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1172.eqiad.wmnet with reason: Upgrade to 10.11
  • 12:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P83558 and previous config saved to /var/cache/conftool/dbconfig/20251001-124622-fceratto.json
  • 12:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T401906)', diff saved to https://phabricator.wikimedia.org/P83556 and previous config saved to /var/cache/conftool/dbconfig/20251001-123115-fceratto.json
  • 12:31 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=swift.*,name=eqiad
  • 12:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T401906)', diff saved to https://phabricator.wikimedia.org/P83555 and previous config saved to /var/cache/conftool/dbconfig/20251001-122959-fceratto.json
  • 12:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 12:29 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=thumbor.*,name=eqiad
  • 12:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T401906)', diff saved to https://phabricator.wikimedia.org/P83554 and previous config saved to /var/cache/conftool/dbconfig/20251001-122936-fceratto.json
  • 12:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 12:21 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1258* gradually with 4 steps - Work done
  • 12:21 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1258.eqiad.wmnet
  • 12:19 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 12:19 mvernon@cumin2002: END (ERROR) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=97) rolling restart_daemons on A:swift-fe-eqiad
  • 12:19 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 12:15 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1258 - Upgrading db1258.eqiad.wmnet
  • 12:15 ladsgroup@cumin1003: START - Cookbook sre.mysql.depool db1258 - Upgrading db1258.eqiad.wmnet
  • 12:15 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db1258.eqiad.wmnet
  • 12:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P83552 and previous config saved to /var/cache/conftool/dbconfig/20251001-121429-fceratto.json
  • 12:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1258 T406116', diff saved to https://phabricator.wikimedia.org/P83551 and previous config saved to /var/cache/conftool/dbconfig/20251001-121339-ladsgroup.json
  • 12:12 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 12:11 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 12:08 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 12:08 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 12:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Promote db1255 to x3 primary T406116', diff saved to https://phabricator.wikimedia.org/P83550 and previous config saved to /var/cache/conftool/dbconfig/20251001-120629-ladsgroup.json
  • 12:06 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 12:06 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 12:06 Amir1: Starting x3 eqiad failover from db1258 to db1255 - T406116
  • 12:05 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:04 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set db1255 with weight 0 T406116', diff saved to https://phabricator.wikimedia.org/P83549 and previous config saved to /var/cache/conftool/dbconfig/20251001-120140-ladsgroup.json
  • 12:00 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 16 hosts with reason: Primary switchover x3 T406116
  • 11:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P83548 and previous config saved to /var/cache/conftool/dbconfig/20251001-115922-fceratto.json
  • 11:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:59 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:58 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:48 cgoubert@cumin1003: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster wikikube-eqiad: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 11:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T401906)', diff saved to https://phabricator.wikimedia.org/P83547 and previous config saved to /var/cache/conftool/dbconfig/20251001-114414-fceratto.json
  • 11:43 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T401906)', diff saved to https://phabricator.wikimedia.org/P83546 and previous config saved to /var/cache/conftool/dbconfig/20251001-114259-fceratto.json
  • 11:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 11:42 hnowlan: manually bumped thumbor replicas in codfw to 140
  • 11:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 11:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T401906)', diff saved to https://phabricator.wikimedia.org/P83545 and previous config saved to /var/cache/conftool/dbconfig/20251001-114214-fceratto.json
  • 11:41 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=thumbor.*,name=eqiad
  • 11:39 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:39 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:37 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:37 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:35 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:35 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:29 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=swift.*,name=eqiad
  • 11:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P83544 and previous config saved to /var/cache/conftool/dbconfig/20251001-112707-fceratto.json
  • 11:25 Amir1: dropping two unused tables in phabricator db (T403542)
  • 11:18 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=thumbor.*,name=codfw
  • 11:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P83542 and previous config saved to /var/cache/conftool/dbconfig/20251001-111159-fceratto.json
  • 11:05 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=toolhub.*
  • 11:04 cgoubert@cumin1003: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool toolhub in eqiad: maintenance
  • 11:04 cgoubert@cumin1003: START - Cookbook sre.discovery.service-route depool toolhub in eqiad: maintenance
  • 11:03 cgoubert@deploy2002: Locking from deployment [ALL REPOSITORIES]: eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 11:03 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:03 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:03 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:02 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:02 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:02 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 11:01 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 11:01 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 11:01 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
  • 11:01 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 11:01 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
  • 11:00 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 10:59 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 10:59 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 10:59 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 10:59 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 10:58 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 10:58 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:58 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 10:57 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 10:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T401906)', diff saved to https://phabricator.wikimedia.org/P83541 and previous config saved to /var/cache/conftool/dbconfig/20251001-105652-fceratto.json
  • 10:55 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T401906)', diff saved to https://phabricator.wikimedia.org/P83540 and previous config saved to /var/cache/conftool/dbconfig/20251001-105538-fceratto.json
  • 10:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 10:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T401906)', diff saved to https://phabricator.wikimedia.org/P83539 and previous config saved to /var/cache/conftool/dbconfig/20251001-105514-fceratto.json
  • 10:55 claime: Starting eqiad Wikikube kubernetes cluster upgrade to 1.31 - T405703
  • 10:45 hashar@deploy2002: Finished scap sync-world: Backport for Revert "Replace LoginNotify::getInstance with service injection" (T406094) (duration: 13m 47s)
  • 10:40 hashar@deploy2002: hashar, dreamyjazz: Continuing with sync
  • 10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P83538 and previous config saved to /var/cache/conftool/dbconfig/20251001-104006-fceratto.json
  • 10:36 hashar@deploy2002: hashar, dreamyjazz: Backport for Revert "Replace LoginNotify::getInstance with service injection" (T406094) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:31 hashar@deploy2002: Started scap sync-world: Backport for Revert "Replace LoginNotify::getInstance with service injection" (T406094)
  • 10:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P83537 and previous config saved to /var/cache/conftool/dbconfig/20251001-102458-fceratto.json
  • 10:11 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:11 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:11 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:10 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T401906)', diff saved to https://phabricator.wikimedia.org/P83536 and previous config saved to /var/cache/conftool/dbconfig/20251001-100951-fceratto.json
  • 10:09 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:08 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:08 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T401906)', diff saved to https://phabricator.wikimedia.org/P83535 and previous config saved to /var/cache/conftool/dbconfig/20251001-100837-fceratto.json
  • 10:08 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 10:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T401906)', diff saved to https://phabricator.wikimedia.org/P83534 and previous config saved to /var/cache/conftool/dbconfig/20251001-100814-fceratto.json
  • 09:59 kharlan@deploy2002: Finished scap sync-world: Backport for CreateAccount: Fix server side logging of CAPTCHA class (T405239), CreateAccount: Fix server side logging of CAPTCHA class (T405239) (duration: 15m 47s)
  • 09:54 kharlan@deploy2002: kharlan: Continuing with sync
  • 09:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P83533 and previous config saved to /var/cache/conftool/dbconfig/20251001-095306-fceratto.json
  • 09:50 kharlan@deploy2002: kharlan: Backport for CreateAccount: Fix server side logging of CAPTCHA class (T405239), CreateAccount: Fix server side logging of CAPTCHA class (T405239) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:48 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 09:44 kharlan@deploy2002: Started scap sync-world: Backport for CreateAccount: Fix server side logging of CAPTCHA class (T405239), CreateAccount: Fix server side logging of CAPTCHA class (T405239)
  • 09:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P83532 and previous config saved to /var/cache/conftool/dbconfig/20251001-093758-fceratto.json
  • 09:28 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:28 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T401906)', diff saved to https://phabricator.wikimedia.org/P83531 and previous config saved to /var/cache/conftool/dbconfig/20251001-092251-fceratto.json
  • 09:21 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T401906)', diff saved to https://phabricator.wikimedia.org/P83530 and previous config saved to /var/cache/conftool/dbconfig/20251001-092136-fceratto.json
  • 09:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 09:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T401906)', diff saved to https://phabricator.wikimedia.org/P83529 and previous config saved to /var/cache/conftool/dbconfig/20251001-092112-fceratto.json
  • 09:17 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:17 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:14 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:14 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:12 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:11 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:06 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:06 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P83528 and previous config saved to /var/cache/conftool/dbconfig/20251001-090604-fceratto.json
  • 08:57 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 08:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P83527 and previous config saved to /var/cache/conftool/dbconfig/20251001-085056-fceratto.json
  • 08:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T401906)', diff saved to https://phabricator.wikimedia.org/P83526 and previous config saved to /var/cache/conftool/dbconfig/20251001-083549-fceratto.json
  • 08:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T401906)', diff saved to https://phabricator.wikimedia.org/P83525 and previous config saved to /var/cache/conftool/dbconfig/20251001-083435-fceratto.json
  • 08:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 08:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T401906)', diff saved to https://phabricator.wikimedia.org/P83524 and previous config saved to /var/cache/conftool/dbconfig/20251001-083412-fceratto.json
  • 08:19 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 08:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P83523 and previous config saved to /var/cache/conftool/dbconfig/20251001-081905-fceratto.json
  • 08:13 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 08:10 Emperor: restart swift on ms-fe2012 T360913
  • 08:08 bwojtowicz@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 08:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P83522 and previous config saved to /var/cache/conftool/dbconfig/20251001-080357-fceratto.json
  • 07:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T401906)', diff saved to https://phabricator.wikimedia.org/P83521 and previous config saved to /var/cache/conftool/dbconfig/20251001-074850-fceratto.json
  • 07:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T401906)', diff saved to https://phabricator.wikimedia.org/P83520 and previous config saved to /var/cache/conftool/dbconfig/20251001-074736-fceratto.json
  • 07:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 07:10 kharlan@deploy2002: Finished scap sync-world: Backport for CreateAccount: Track interactions with the captchaWord field (T394744), CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239) (duration: 14m 09s)
  • 07:05 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:02 kharlan@deploy2002: kharlan: Backport for CreateAccount: Track interactions with the captchaWord field (T394744), CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:55 kharlan@deploy2002: Started scap sync-world: Backport for CreateAccount: Track interactions with the captchaWord field (T394744), CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239)
  • 06:40 kharlan@deploy2002: Finished scap sync-world: Backport for CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239), CreateAccount: Track interactions with the captchaWord field (T394744) (duration: 22m 34s)
  • 06:35 kharlan@deploy2002: kharlan: Continuing with sync
  • 06:22 kharlan@deploy2002: kharlan: Backport for CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239), CreateAccount: Track interactions with the captchaWord field (T394744) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:17 kharlan@deploy2002: Started scap sync-world: Backport for CreateAccount: Record the CAPTCHA class used in account creation funnel (T405239), CreateAccount: Track interactions with the captchaWord field (T394744)
  • 04:54 TimStarling: on x1 metawiki creating tables for CommunityRequests
  • 02:31 musikanimal@deploy2002: Finished scap sync-world: Backport for AbstractRenderer: fix extistence dependency on Votes subpage (duration: 12m 19s)
  • 02:26 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 02:26 musikanimal@deploy2002: musikanimal: Backport for AbstractRenderer: fix extistence dependency on Votes subpage synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 02:19 musikanimal@deploy2002: Started scap sync-world: Backport for AbstractRenderer: fix extistence dependency on Votes subpage
  • 01:52 musikanimal@deploy2002: Finished scap sync-world: Backport for Call WikiPage::doPurge to try and clear cache after language is set (T404748) (duration: 10m 47s)
  • 01:47 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 01:46 musikanimal@deploy2002: musikanimal: Backport for Call WikiPage::doPurge to try and clear cache after language is set (T404748) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:41 musikanimal@deploy2002: Started scap sync-world: Backport for Call WikiPage::doPurge to try and clear cache after language is set (T404748)
  • 01:28 musikanimal@deploy2002: Finished scap sync-world: Backport for migrateFromGadget: add a few more missing transformations (T405826 T404138 T404234) (duration: 10m 53s)
  • 01:23 musikanimal@deploy2002: musikanimal: Continuing with sync
  • 01:22 musikanimal@deploy2002: musikanimal: Backport for migrateFromGadget: add a few more missing transformations (T405826 T404138 T404234) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:17 musikanimal@deploy2002: Started scap sync-world: Backport for migrateFromGadget: add a few more missing transformations (T405826 T404138 T404234)
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 33s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 00:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbprov1007.eqiad.wmnet with OS bookworm
  • 00:00 krinkle@deploy2002: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on Wikidata (T403510) (duration: 13m 23s)


Other archives

2000s

2010s

2020-2024

2025-present