Jump to content

Server Admin Log

From Wikitech

2025-07-13

  • 18:04 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 17:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 17:36 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 17:17 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 14:11 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 13:47 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 13:42 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 13:24 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 13:23 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudnet2006-dev.codfw.wmnet with OS bullseye
  • 13:14 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bullseye

2025-07-12

  • 21:04 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1259.eqiad.wmnet with OS bookworm
  • 21:04 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 21:04 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 20:48 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1259.eqiad.wmnet with reason: host reimage
  • 20:42 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1259.eqiad.wmnet with reason: host reimage
  • 20:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host db1259.eqiad.wmnet with OS bookworm
  • 20:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1259.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:11 vriley@cumin1002: START - Cookbook sre.hosts.provision for host db1259.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:10 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1259.eqiad.wmnet with OS bookworm
  • 19:53 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host db1259.eqiad.wmnet with OS bookworm
  • 19:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1259.eqiad.wmnet with OS bookworm
  • 08:47 moritzm: restarted Tomcat on idp1004

2025-07-11

  • 22:26 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host db1259.eqiad.wmnet with OS bookworm
  • 22:10 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1259.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host db1259.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:49 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1259
  • 21:48 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host db1259
  • 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db1259 - vriley@cumin1002"
  • 21:46 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db1259 - vriley@cumin1002"
  • 21:43 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 18:24 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1008.eqiad.wmnet with OS bullseye
  • 18:23 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1007.eqiad.wmnet with OS bullseye
  • 18:20 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 18:09 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
  • 18:07 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
  • 18:06 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
  • 18:03 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
  • 18:03 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 17:57 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 17:48 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye
  • 17:45 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bullseye
  • 17:39 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 17:32 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 17:11 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 17:10 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: done testing issues with primary arelion link, T399221]
  • 17:10 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: done testing issues with primary arelion link, T399221]
  • 16:51 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 16:51 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 16:46 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bullseye
  • 16:31 topranks: drain Arelion CCT from codfw to eqsin - still see minor packet loss which is affecting purged T399221
  • 16:19 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 16:16 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 15:56 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bullseye
  • 15:54 topranks: un-drain Arelion CCT from codfw to eqsin T399221
  • 15:44 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 15:44 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 15:39 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:38 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bullseye
  • 15:38 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:36 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:28 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqsin [reason: testing issues with primary arelion link, T399221]
  • 15:28 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: testing issues with primary arelion link, T399221]
  • 15:06 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 14:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78933 and previous config saved to /var/cache/conftool/dbconfig/20250711-145205-root.json
  • 14:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
  • 14:44 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
  • 14:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78932 and previous config saved to /var/cache/conftool/dbconfig/20250711-143659-root.json
  • 14:27 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 14:25 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 14:25 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 14:24 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 14:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78931 and previous config saved to /var/cache/conftool/dbconfig/20250711-142154-root.json
  • 14:10 akosiaris: sudo swapoff /dev/md1 on cloudcephosd1036 T399281
  • 14:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78930 and previous config saved to /var/cache/conftool/dbconfig/20250711-140648-root.json
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2242 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78929 and previous config saved to /var/cache/conftool/dbconfig/20250711-135919-marostegui.json
  • 13:59 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2242.codfw.wmnet with reason: Maintenance
  • 13:55 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 13:48 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 13:48 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 13:41 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 13:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78927 and previous config saved to /var/cache/conftool/dbconfig/20250711-133539-root.json
  • 13:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78925 and previous config saved to /var/cache/conftool/dbconfig/20250711-132034-root.json
  • 13:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1050.eqiad.wmnet with OS bookworm
  • 13:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1049.eqiad.wmnet with OS bookworm
  • 13:11 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 13:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1051.eqiad.wmnet with OS bookworm
  • 13:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78924 and previous config saved to /var/cache/conftool/dbconfig/20250711-130528-root.json
  • 13:03 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1034 gradually with 4 steps - Pooling in
  • 13:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 12:57 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 12:57 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 12:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 12:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 12:52 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS trixie
  • 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78921 and previous config saved to /var/cache/conftool/dbconfig/20250711-125022-root.json
  • 12:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 12:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 12:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 12:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2187 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78919 and previous config saved to /var/cache/conftool/dbconfig/20250711-124249-marostegui.json
  • 12:42 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:38 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 12:33 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 12:30 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) es1032 - Depooling RO host
  • 12:30 fceratto@cumin1002: START - Cookbook sre.mysql.depool es1032 - Depooling RO host
  • 12:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bookworm
  • 12:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bookworm
  • 12:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1049.eqiad.wmnet with OS bookworm
  • 12:28 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) es1032 - Depooling RO host
  • 12:28 fceratto@cumin1002: START - Cookbook sre.mysql.depool es1032 - Depooling RO host
  • 12:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 12:24 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:24 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:22 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:22 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:20 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 12:20 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS trixie
  • 12:20 fceratto@cumin1002: START - Cookbook sre.mysql.pool es1034 gradually with 4 steps - Pooling in
  • 12:20 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) es1034 gradually with 4 steps - Pooling in
  • 12:20 fceratto@cumin1002: START - Cookbook sre.mysql.pool es1034 gradually with 4 steps - Pooling in
  • 12:20 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) es1034 gradually with 4 steps - Pooling in
  • 12:20 fceratto@cumin1002: START - Cookbook sre.mysql.pool es1034 gradually with 4 steps - Pooling in
  • 12:19 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:19 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:17 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 12:17 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1034.eqiad.wmnet
  • 12:17 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for es1034.eqiad.wmnet
  • 12:06 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:06 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:04 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:04 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:03 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es1034.eqiad.wmnet
  • 12:01 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:01 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:52 fceratto@cumin1002: START - Cookbook sre.hosts.reboot-single for host es1034.eqiad.wmnet
  • 11:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78916 and previous config saved to /var/cache/conftool/dbconfig/20250711-114439-root.json
  • 11:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depool es1034 for upgrade', diff saved to https://phabricator.wikimedia.org/P78915 and previous config saved to /var/cache/conftool/dbconfig/20250711-113532-fceratto.json
  • 11:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1034.eqiad.wmnet with reason: Maintenance
  • 11:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 11:30 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1031.eqiad.wmnet
  • 11:30 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for es1031.eqiad.wmnet
  • 11:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 11:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78914 and previous config saved to /var/cache/conftool/dbconfig/20250711-112933-root.json
  • 11:26 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 11:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1031.eqiad.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78913 and previous config saved to /var/cache/conftool/dbconfig/20250711-111428-root.json
  • 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78912 and previous config saved to /var/cache/conftool/dbconfig/20250711-105922-root.json
  • 10:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78911 and previous config saved to /var/cache/conftool/dbconfig/20250711-105039-root.json
  • 10:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78910 and previous config saved to /var/cache/conftool/dbconfig/20250711-103533-root.json
  • 10:32 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 10:32 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 10:31 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 10:31 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 10:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:26 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1003.eqiad.wmnet with OS trixie
  • 10:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78909 and previous config saved to /var/cache/conftool/dbconfig/20250711-102027-root.json
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78908 and previous config saved to /var/cache/conftool/dbconfig/20250711-100522-root.json
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2192', diff saved to https://phabricator.wikimedia.org/P78907 and previous config saved to /var/cache/conftool/dbconfig/20250711-100106-root.json
  • 10:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78906 and previous config saved to /var/cache/conftool/dbconfig/20250711-100033-root.json
  • 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78905 and previous config saved to /var/cache/conftool/dbconfig/20250711-094527-root.json
  • 09:39 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2192 T399280', diff saved to https://phabricator.wikimedia.org/P78904 and previous config saved to /var/cache/conftool/dbconfig/20250711-093115-root.json
  • 09:30 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2213 to s5 primary T399280', diff saved to https://phabricator.wikimedia.org/P78903 and previous config saved to /var/cache/conftool/dbconfig/20250711-093006-marostegui.json
  • 09:29 marostegui: Starting s5 codfw failover from db2192 to db2213 - T399280
  • 09:27 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS trixie
  • 09:25 moritzm: imported perccli for trixie-wikimedia T391083
  • 09:18 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2213 from API/vslow/dump T399280', diff saved to https://phabricator.wikimedia.org/P78902 and previous config saved to /var/cache/conftool/dbconfig/20250711-091812-root.json
  • 09:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s5 T399280
  • 09:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78901 and previous config saved to /var/cache/conftool/dbconfig/20250711-091242-root.json
  • 09:04 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1003.eqiad.wmnet with OS trixie
  • 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78900 and previous config saved to /var/cache/conftool/dbconfig/20250711-085736-root.json
  • 08:51 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 08:51 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 08:51 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 08:51 elukey@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78899 and previous config saved to /var/cache/conftool/dbconfig/20250711-084230-root.json
  • 08:42 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 08:41 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 08:34 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetserver2003.codfw.wmnet
  • 08:34 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:34 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetserver2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 08:33 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetserver2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 08:30 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 08:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78898 and previous config saved to /var/cache/conftool/dbconfig/20250711-082725-root.json
  • 08:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2223 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78897 and previous config saved to /var/cache/conftool/dbconfig/20250711-081953-marostegui.json
  • 08:19 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2223.codfw.wmnet with reason: Maintenance
  • 08:18 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts puppetserver2003.codfw.wmnet
  • 08:02 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 07:56 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 07:36 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS trixie
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78896 and previous config saved to /var/cache/conftool/dbconfig/20250711-072439-root.json
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78895 and previous config saved to /var/cache/conftool/dbconfig/20250711-070933-root.json
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78894 and previous config saved to /var/cache/conftool/dbconfig/20250711-065428-root.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78893 and previous config saved to /var/cache/conftool/dbconfig/20250711-063922-root.json
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2213 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78892 and previous config saved to /var/cache/conftool/dbconfig/20250711-063156-marostegui.json
  • 06:31 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 03:35 andrew@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1041.eqiad.wmnet
  • 03:35 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1041.eqiad.wmnet
  • 03:21 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1041.eqiad.wmnet
  • 03:13 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1041.eqiad.wmnet
  • 02:30 andrew@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1040.eqiad.wmnet
  • 02:30 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1040.eqiad.wmnet
  • 02:17 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1040.eqiad.wmnet
  • 02:13 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1040.eqiad.wmnet
  • 02:11 andrew@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet
  • 02:11 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet
  • 02:10 andrew@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet
  • 02:10 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet
  • 02:09 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet
  • 02:01 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet
  • 01:17 andrew@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1038.eqiad.wmnet
  • 01:17 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1038.eqiad.wmnet
  • 01:03 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1038.eqiad.wmnet
  • 00:57 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1038.eqiad.wmnet
  • 00:55 krinkle@deploy1003: Finished scap sync-world: Backport for beta: Change FileRepo zone URL to upload.wikimedia.beta.wmcloud.org (T289318) (duration: 11m 30s)
  • 00:50 krinkle@deploy1003: krinkle: Continuing with sync
  • 00:47 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1037.eqiad.wmnet with OS bookworm
  • 00:45 krinkle@deploy1003: krinkle: Backport for beta: Change FileRepo zone URL to upload.wikimedia.beta.wmcloud.org (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:43 krinkle@deploy1003: Started scap sync-world: Backport for beta: Change FileRepo zone URL to upload.wikimedia.beta.wmcloud.org (T289318)
  • 00:34 krinkle@deploy1003: Finished scap sync-world: Backport for beta: Move beta wikipedia canonical to beta.wmcloud.org (T289318) (duration: 12m 13s)
  • 00:29 krinkle@deploy1003: krinkle: Continuing with sync
  • 00:27 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
  • 00:24 krinkle@deploy1003: krinkle: Backport for beta: Move beta wikipedia canonical to beta.wmcloud.org (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:22 krinkle@deploy1003: Started scap sync-world: Backport for beta: Move beta wikipedia canonical to beta.wmcloud.org (T289318)
  • 00:21 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage

2025-07-10

  • 23:59 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bookworm
  • 23:57 andrew@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1037.eqiad.wmnet
  • 23:57 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1037.eqiad.wmnet
  • 23:44 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1037.eqiad.wmnet
  • 23:03 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1037.eqiad.wmnet
  • 23:02 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bookworm
  • 22:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 22:39 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 22:30 zabe@deploy1003: Finished scap sync-world: Backport for Fix categorylinks read new query for excluded categories (T385890) (duration: 07m 59s)
  • 22:25 zabe@deploy1003: zabe: Continuing with sync
  • 22:24 zabe@deploy1003: zabe: Backport for Fix categorylinks read new query for excluded categories (T385890) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:22 zabe@deploy1003: Started scap sync-world: Backport for Fix categorylinks read new query for excluded categories (T385890)
  • 22:16 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bookworm
  • 22:13 andrew@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1036.eqiad.wmnet
  • 22:13 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1036.eqiad.wmnet
  • 22:00 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1036.eqiad.wmnet
  • 21:55 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1036.eqiad.wmnet
  • 21:32 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bookworm
  • 21:12 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 21:06 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 20:54 jforrester@deploy1003: Finished scap sync-world: Backport for Use `sul` dblist in InitialiseSettings (duration: 11m 43s)
  • 20:48 jforrester@deploy1003: jforrester, bd808: Continuing with sync
  • 20:44 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bookworm
  • 20:44 jforrester@deploy1003: jforrester, bd808: Backport for Use `sul` dblist in InitialiseSettings synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:42 andrew@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:42 jforrester@deploy1003: Started scap sync-world: Backport for Use `sul` dblist in InitialiseSettings
  • 20:41 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:39 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:39 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1035.eqiad.wmnet
  • 20:25 root@cumin1003: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:25 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:25 robh@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1035.eqiad.wmnet
  • 20:24 root@cumin1003: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:24 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:21 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:14 aqu@deploy1003: Finished deploy [airflow-dags/analytics@c558ea4]: Artifactct analytics / main (duration: 00m 43s)
  • 20:13 aqu@deploy1003: Started deploy [airflow-dags/analytics@c558ea4]: Artifactct analytics / main
  • 20:12 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@c558ea4]: Artifactct analytics-test (duration: 00m 13s)
  • 20:12 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@c558ea4]: Artifactct analytics-test
  • 19:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 19:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 19:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 19:07 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 19:07 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 19:05 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 19:05 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 19:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 19:01 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 19:01 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 19:00 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 19:00 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 19:00 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035.eqiad.wmnet']
  • 19:00 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035.eqiad.wmnet']
  • 18:59 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 18:58 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:58 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:50 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:50 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:50 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:50 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:49 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:49 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:48 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:47 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: arelion drained; traffic is going through ulsfo to codfw, T399221]
  • 18:47 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: arelion drained; traffic is going through ulsfo to codfw, T399221]
  • 18:44 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:44 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:44 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:44 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:43 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:43 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:43 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:43 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:42 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:42 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:39 sukhe: clearing varnish and ATS cache on cp5017 before repooling eqsin: T399221
  • 18:39 sukhe: sukhe@cp5017:~$ sudo systemctl stop trafficserver.service && sudo traffic_server -C clear_cache && sudo systemctl start trafficserver.service: T399221
  • 18:39 sukhe: sukhe@cp5017:~$ sudo systemctl stop trafficserver.service && sudo traffic_server -C clear_cache && sudo systemctl start trafficserver.service
  • 18:28 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 18:28 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 18:19 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 18:18 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:57 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78890 and previous config saved to /var/cache/conftool/dbconfig/20250710-175730-root.json
  • 17:56 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:55 andrew@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:55 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:42 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:42 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78889 and previous config saved to /var/cache/conftool/dbconfig/20250710-174225-root.json
  • 17:33 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:28 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:27 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78887 and previous config saved to /var/cache/conftool/dbconfig/20250710-172719-root.json
  • 17:25 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1049.eqiad.wmnet
  • 17:25 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1049.eqiad.wmnet
  • 17:22 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1049.eqiad.wmnet']
  • 17:22 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1049.eqiad.wmnet']
  • 17:12 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78886 and previous config saved to /var/cache/conftool/dbconfig/20250710-171214-root.json
  • 17:05 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 16:50 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis mediawikiwiki, testwiki in section s3
  • 16:24 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:22 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:21 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:20 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:58 hnowlan@dns1004: END - running authdns-update
  • 15:57 hnowlan@dns1004: START - running authdns-update
  • 15:54 xcollazo: refreshed YARN queues definition in production via https://phabricator.wikimedia.org/T399013#10992686
  • 15:52 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 15:40 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 15:35 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis mediawikiwiki, testwiki in section s3
  • 15:32 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
  • 15:30 hnowlan@dns1004: END - running authdns-update
  • 15:29 hnowlan@dns1004: START - running authdns-update
  • 15:25 volans: upgrade spicerack to 11.3.0 on cumin100[2-3]
  • 15:21 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2006.codfw.wmnet with OS bookworm
  • 15:20 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 15:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 15:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 15:11 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on gerrit2003.wikimedia.org with reason: maintenance
  • 15:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 15:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 14:56 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1008.eqiad.wmnet with OS bookworm
  • 14:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2006.codfw.wmnet with OS bookworm
  • 14:54 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 14:41 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqsin [reason: no reason specified, no task ID specified]
  • 14:41 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: no reason specified, no task ID specified]
  • 14:41 vgutierrez: depooling eqsin
  • 14:38 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
  • 14:34 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
  • 14:33 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 14:31 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 14:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 14:30 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 14:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78884 and previous config saved to /var/cache/conftool/dbconfig/20250710-142707-root.json
  • 14:24 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 14:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 14:16 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bookworm
  • 14:15 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 14:12 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 14:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78883 and previous config saved to /var/cache/conftool/dbconfig/20250710-141202-root.json
  • 14:04 andrew@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1008.eqiad.wmnet']
  • 14:03 elukey@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 14:03 elukey@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
  • 14:03 elukey@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:02 elukey@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:01 volans@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
  • 14:00 volans@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1002.eqiad.wmnet
  • 13:58 volans@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
  • 13:57 vgutierrez: restarting varnish and ATS in cp5017
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78882 and previous config saved to /var/cache/conftool/dbconfig/20250710-135656-root.json
  • 13:52 hashar: UTC afternoon backport window completed
  • 13:51 hashar@deploy1003: Finished scap sync-world: Backport for fix(StructuredTask): wrong order in resolving a deferred (duration: 11m 10s)
  • 13:51 volans@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1002.eqiad.wmnet
  • 13:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance
  • 13:49 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1008.eqiad.wmnet']
  • 13:48 volans@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
  • 13:47 klausman@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=inference,name=codfw
  • 13:46 volans: upgrade spicerack on cumin2002 to 11.3.0
  • 13:46 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2006.codfw.wmnet with OS trixie
  • 13:46 hashar@deploy1003: migr, hashar: Continuing with sync
  • 13:42 hashar@deploy1003: migr, hashar: Backport for fix(StructuredTask): wrong order in resolving a deferred synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78881 and previous config saved to /var/cache/conftool/dbconfig/20250710-134150-root.json
  • 13:40 hashar@deploy1003: Started scap sync-world: Backport for fix(StructuredTask): wrong order in resolving a deferred
  • 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2211 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78880 and previous config saved to /var/cache/conftool/dbconfig/20250710-133418-marostegui.json
  • 13:34 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78879 and previous config saved to /var/cache/conftool/dbconfig/20250710-133047-root.json
  • 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78878 and previous config saved to /var/cache/conftool/dbconfig/20250710-131541-root.json
  • 13:08 klausman@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:06 moritzm: installing ICU security updates
  • 13:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78876 and previous config saved to /var/cache/conftool/dbconfig/20250710-130036-root.json
  • 12:59 klausman@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:52 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78875 and previous config saved to /var/cache/conftool/dbconfig/20250710-124530-root.json
  • 12:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78874 and previous config saved to /var/cache/conftool/dbconfig/20250710-124051-root.json
  • 12:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 2.8.15 upgrade (T398720)
  • 12:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2171 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78873 and previous config saved to /var/cache/conftool/dbconfig/20250710-123809-marostegui.json
  • 12:38 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 12:35 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - 2.8.15 upgrade (T398720)
  • 12:32 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1007.eqiad.wmnet with OS bookworm
  • 12:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78872 and previous config saved to /var/cache/conftool/dbconfig/20250710-122545-root.json
  • 12:25 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 12:17 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.sanitize-wiki (exit_code=97) Managing sanitization for wikis mediawikiwiki, testwiki in section s3
  • 12:15 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis mediawikiwiki, testwiki in section s3
  • 12:14 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
  • 12:11 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
  • 12:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78871 and previous config saved to /var/cache/conftool/dbconfig/20250710-121039-root.json
  • 12:06 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis mediawikiwiki, testwiki in section s5
  • 12:01 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - 2.8.15 upgrade (T398720)
  • 12:00 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 2.8.15 upgrade (T398720)
  • 11:57 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams and A:cp - 2.8.15 upgrade (T398720)
  • 11:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78870 and previous config saved to /var/cache/conftool/dbconfig/20250710-115534-root.json
  • 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams and A:cp - 2.8.15 upgrade (T398720)
  • 11:53 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bookworm
  • 11:52 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis mediawikiwiki, testwiki in section s5
  • 11:51 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis mediawikiwiki, testwiki in section s5
  • 11:49 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1007.eqiad.wmnet with OS bookworm
  • 11:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1200 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78869 and previous config saved to /var/cache/conftool/dbconfig/20250710-114739-marostegui.json
  • 11:47 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 11:46 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis mediawikiwiki, testwiki in section s5
  • 11:44 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bookworm
  • 11:41 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1007.eqiad.wmnet with OS bookworm
  • 11:39 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
  • 11:35 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 11:35 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 11:35 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 11:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 11:33 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
  • 11:30 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 11:30 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bookworm
  • 11:30 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 11:29 andrew@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1007.eqiad.wmnet']
  • 11:24 vgutierrez: rolling restart of purged in eqsin
  • 11:21 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1007.eqiad.wmnet']
  • 11:14 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams and A:cp - 2.8.15 upgrade (T398720)
  • 11:09 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams and A:cp - 2.8.15 upgrade (T398720)
  • 11:06 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 11:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1039', diff saved to https://phabricator.wikimedia.org/P78867 and previous config saved to /var/cache/conftool/dbconfig/20250710-110408-marostegui.json
  • 10:33 elukey: kafka preferred-replica-election on kafka-main2010
  • 10:05 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 10:05 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 10:05 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 10:05 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 10:05 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 10:05 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 10:05 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:04 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:04 vgutierrez: resetting eqiad.resource-topic offsets for cp5017 consumer group
  • 09:45 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 09:45 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 09:44 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 09:44 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 09:43 moritzm: installing initramfs-tools bugfix updates from Bookworm point release
  • 09:15 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2240 gradually with 4 steps - Pooling in
  • 09:15 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2240 gradually with 4 steps - Pooling in
  • 09:14 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2161 gradually with 4 steps - Pooling in
  • 09:14 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2161 gradually with 4 steps - Pooling in
  • 09:12 fceratto@cumin1002: dbctl commit (dc=all): 'Update db2240 T397163', diff saved to https://phabricator.wikimedia.org/P78865 and previous config saved to /var/cache/conftool/dbconfig/20250710-091250-fceratto.json
  • 09:05 vgutierrez: restarting purged on cp5017
  • 09:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw and A:cp - 2.8.15 upgrade (T398720)
  • 08:51 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp - 2.8.15 upgrade (T398720)
  • 08:45 moritzm: installing setuptools security updates
  • 08:40 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 08:40 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78863 and previous config saved to /var/cache/conftool/dbconfig/20250710-083719-root.json
  • 08:31 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 08:30 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78861 and previous config saved to /var/cache/conftool/dbconfig/20250710-082213-root.json
  • 08:15 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp - 2.8.15 upgrade (T398720)
  • 08:12 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw and A:cp - 2.8.15 upgrade (T398720)
  • 08:11 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.9 refs T392179
  • 08:10 moritzm: installing containerd security updates
  • 08:07 klausman@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=inference,name=codfw
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2178 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78860 and previous config saved to /var/cache/conftool/dbconfig/20250710-080708-root.json
  • 08:07 klausman@cumin1002: conftool action : get/pooled; selector: dnsdisc=inference,name=codfw
  • 08:05 klausman: Depooling Liftwing prod in codfw so we can roll out some changes that restart all services (cf. T398533)
  • 08:00 moritzm: installing python-urllib3 security updates
  • 07:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2178 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78859 and previous config saved to /var/cache/conftool/dbconfig/20250710-075202-root.json
  • 07:51 vgutierrez: switching to upload cert globally on upload CDN cluster - T394484
  • 07:47 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: sync
  • 07:44 elukey@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: sync
  • 07:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2178 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78858 and previous config saved to /var/cache/conftool/dbconfig/20250710-074432-marostegui.json
  • 07:44 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 07:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru and A:cp - 2.8.15 upgrade (T398720)
  • 07:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru and A:cp - 2.8.15 upgrade (T398720)
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78857 and previous config saved to /var/cache/conftool/dbconfig/20250710-073907-root.json
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2228 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78856 and previous config saved to /var/cache/conftool/dbconfig/20250710-073123-root.json
  • 07:29 hashar: Restarting CI Jenkins
  • 07:25 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78855 and previous config saved to /var/cache/conftool/dbconfig/20250710-072401-root.json
  • 07:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2228 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78853 and previous config saved to /var/cache/conftool/dbconfig/20250710-071616-root.json
  • 07:10 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78852 and previous config saved to /var/cache/conftool/dbconfig/20250710-070855-root.json
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2228 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78851 and previous config saved to /var/cache/conftool/dbconfig/20250710-070111-root.json
  • 07:00 moritzm: installing libbpf security updates
  • 06:59 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru and A:cp - 2.8.15 upgrade (T398720)
  • 06:59 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 06:58 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 06:58 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 06:55 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78849 and previous config saved to /var/cache/conftool/dbconfig/20250710-065350-root.json
  • 06:52 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 06:52 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru and A:cp - 2.8.15 upgrade (T398720)
  • 06:49 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 06:47 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2228 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78848 and previous config saved to /var/cache/conftool/dbconfig/20250710-064605-root.json
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1210 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78847 and previous config saved to /var/cache/conftool/dbconfig/20250710-064558-marostegui.json
  • 06:45 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 06:44 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 06:44 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 06:39 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2228.codfw.wmnet with reason: Maintenance
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2228 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78846 and previous config saved to /var/cache/conftool/dbconfig/20250710-063535-marostegui.json
  • 06:35 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2228.codfw.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance
  • 05:54 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:38 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 05:22 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Bugfixes - oblivian@cumin1003"
  • 05:22 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes - oblivian@cumin1003
  • 05:21 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes - oblivian@cumin1003
  • 05:21 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Bugfixes - oblivian@cumin1003"
  • 04:58 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 04:57 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 04:56 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 04:55 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 04:32 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 04:32 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 04:29 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 04:18 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 04:17 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 04:16 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 04:16 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 04:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 04:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 04:12 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 04:10 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 03:58 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 03:55 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 03:37 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 03:36 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 03:01 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 03:01 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 02:53 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 02:46 root@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:39 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:37 root@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:29 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:29 root@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:28 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:11 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:04 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:04 root@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:03 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:03 root@cumin1003: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:03 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 01:55 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudcephosd1006.eqiad.wmnet
  • 01:53 andrew@cumin1003: START - Cookbook sre.hosts.dhcp for host cloudcephosd1006.eqiad.wmnet
  • 01:53 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 01:39 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 00:42 andrew@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1006.eqiad.wmnet with OS bookworm

2025-07-09

  • 23:21 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 22:29 dreamyjazz@deploy1003: Finished scap sync-world: Backport for ukwiki: allow bureaucrats to assign and remove temporary-account-viewer group (T398738) (duration: 10m 18s)
  • 22:23 dreamyjazz@deploy1003: dreamyjazz, dreamrimmer: Continuing with sync
  • 22:21 dreamyjazz@deploy1003: dreamyjazz, dreamrimmer: Backport for ukwiki: allow bureaucrats to assign and remove temporary-account-viewer group (T398738) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:18 dreamyjazz@deploy1003: Started scap sync-world: Backport for ukwiki: allow bureaucrats to assign and remove temporary-account-viewer group (T398738)
  • 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1048.eqiad.wmnet with OS bookworm
  • 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 21:45 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 21:28 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1048.eqiad.wmnet with reason: host reimage
  • 21:24 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1048.eqiad.wmnet with reason: host reimage
  • 21:15 dancy@deploy1003: Finished scap sync-world: Backport for Add DEPRECATED_LANGUAGE_CODE_MAPPING to wgInterlanguageLinkCodeMap (T248352) (duration: 10m 25s)
  • 21:10 dancy@deploy1003: dancy, fomafix: Continuing with sync
  • 21:07 dancy@deploy1003: dancy, fomafix: Backport for Add DEPRECATED_LANGUAGE_CODE_MAPPING to wgInterlanguageLinkCodeMap (T248352) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:05 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host es1048.eqiad.wmnet with OS bookworm
  • 21:05 dancy@deploy1003: Started scap sync-world: Backport for Add DEPRECATED_LANGUAGE_CODE_MAPPING to wgInterlanguageLinkCodeMap (T248352)
  • 20:55 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host es1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 20:36 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:35 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:34 vriley@cumin1002: START - Cookbook sre.hosts.provision for host es1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 20:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:28 jforrester@deploy1003: Finished scap sync-world: Backport for Pre-deploy Readers Use Cases Survey on enwiki (T398870) (duration: 11m 00s)
  • 20:23 jforrester@deploy1003: jforrester, dani: Continuing with sync
  • 20:19 jforrester@deploy1003: jforrester, dani: Backport for Pre-deploy Readers Use Cases Survey on enwiki (T398870) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:17 jforrester@deploy1003: Started scap sync-world: Backport for Pre-deploy Readers Use Cases Survey on enwiki (T398870)
  • 20:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 20:13 James_F: jforrester@deploy1003:~$ echo 'https://en.wikipedia.org/static/favicon/wikifunctions.ico' | mwscript-k8s --attach purgeList.php -- --wiki enwiki # T326094
  • 20:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 20:13 jforrester@deploy1003: Finished scap sync-world: Backport for shwiki: Add bs, hr and sr as import sources (T399113), Remove white outline from Wikifunctions favicon (T326094) (duration: 08m 52s)
  • 20:10 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1047.eqiad.wmnet with OS bookworm
  • 20:10 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 20:10 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 20:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 20:08 jforrester@deploy1003: jforrester, jhsoby, aleksandar: Continuing with sync
  • 20:07 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 20:06 jforrester@deploy1003: jforrester, jhsoby, aleksandar: Backport for shwiki: Add bs, hr and sr as import sources (T399113), Remove white outline from Wikifunctions favicon (T326094) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:04 jforrester@deploy1003: Started scap sync-world: Backport for shwiki: Add bs, hr and sr as import sources (T399113), Remove white outline from Wikifunctions favicon (T326094)
  • 19:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 19:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 19:51 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 19:51 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 19:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:47 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1047.eqiad.wmnet with reason: host reimage
  • 19:43 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye
  • 19:42 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1047.eqiad.wmnet with reason: host reimage
  • 19:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:24 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host es1047.eqiad.wmnet with OS bookworm
  • 19:21 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:16 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:16 tchin@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:16 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:16 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:16 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:16 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:16 vriley@cumin1002: START - Cookbook sre.hosts.provision for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:12 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:10 vriley@cumin1002: START - Cookbook sre.hosts.provision for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:45 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Upgrade
  • 18:42 sukhe: re-adding ocsp from deployment-prep: commit 3307286: T399114: will remove after Puppet removal
  • 18:40 sukhe: removing ocsp from deployment-prep: commit 3307286: T399114
  • 18:35 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Upgrade
  • 18:33 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Upgrade
  • 18:23 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Upgrade
  • 18:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 17:34 sukhe: re-enabling Puppet on P{ganeti7002* or ganeti7003*}: it was left disabled there during rollout of CR 1166222 by sukhe
  • 16:50 bking@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for wdqs2023.codfw.wmnet: Renew puppet certificate - bking@cumin1002
  • 16:17 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:17 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1004.eqiad.wmnet with OS bookworm
  • 16:15 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:08 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:08 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 16:07 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 16:07 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 16:06 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 16:06 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 16:02 vgutierrez: switching esams, eqsin and drmrs to Let's Encrypt unified/upload certs - T398596
  • 15:57 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
  • 15:54 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
  • 15:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:43 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:40 volans: uploaded spicerack_11.3.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
  • 15:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:33 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bookworm
  • 15:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:23 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: cirrussearch@eqiad
  • 15:23 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
  • 15:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:20 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
  • 15:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:12 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 15:10 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: cirrussearch@eqiad
  • 15:09 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for alias: cirrussearch-eqiad-omega@eqiad
  • 15:05 zabe@deploy1003: Finished scap sync-world: Backport for Revert^2 "Enable categorylinks read new on a few large wikis" (duration: 08m 11s)
  • 15:04 moritzm: installing abseil security updates
  • 15:04 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: cirrussearch-eqiad-omega@eqiad
  • 15:03 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for alias: cirrussearch-eqiad-psi@eqiad
  • 15:00 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:00 zabe@deploy1003: zabe: Continuing with sync
  • 14:59 zabe@deploy1003: zabe: Backport for Revert^2 "Enable categorylinks read new on a few large wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:57 zabe@deploy1003: Started scap sync-world: Backport for Revert^2 "Enable categorylinks read new on a few large wikis"
  • 14:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:57 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: cirrussearch-eqiad-psi@eqiad
  • 14:49 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:49 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for rgw.codfw.dpe.anycast.wmnet - cmooney@cumin1003"
  • 14:49 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for rgw.codfw.dpe.anycast.wmnet - cmooney@cumin1003"
  • 14:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 14:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78842 and previous config saved to /var/cache/conftool/dbconfig/20250709-144440-root.json
  • 14:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 14:42 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78841 and previous config saved to /var/cache/conftool/dbconfig/20250709-144250-root.json
  • 14:41 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: cirrussearch@codfw
  • 14:41 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
  • 14:40 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
  • 14:35 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2004-dev.codfw.wmnet with OS bookworm
  • 14:34 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
  • 14:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 14:30 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: cirrussearch@codfw
  • 14:30 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for alias: cirrussearch-codfw-omega@codfw
  • 14:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78840 and previous config saved to /var/cache/conftool/dbconfig/20250709-142934-root.json
  • 14:27 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78839 and previous config saved to /var/cache/conftool/dbconfig/20250709-142744-root.json
  • 14:24 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: cirrussearch-codfw-omega@codfw
  • 14:23 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for alias: cirrussearch-codfw-psi@codfw
  • 14:23 moritzm: installing bash updates from Bookworm point release
  • 14:17 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
  • 14:17 ecarg@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:17 ecarg@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:17 ecarg@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:16 ecarg@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:16 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: cirrussearch-codfw-psi@codfw
  • 14:15 ecarg@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:14 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
  • 14:14 ecarg@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78838 and previous config saved to /var/cache/conftool/dbconfig/20250709-141428-root.json
  • 14:12 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78837 and previous config saved to /var/cache/conftool/dbconfig/20250709-141238-root.json
  • 14:11 ecarg@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 ecarg@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:10 ecarg@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 ecarg@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:08 ecarg@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:07 ecarg@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • {{safesubst:SAL entry|1=14:01 zabe@deploy1003: Finished scap sync-world: Backport for ApiQueryCategoryMembers: Try stop forcing index in read new code (T399037), ApiQueryCategoryMembers: Try stop forcing index in read new code (T399037), Fix categorylinks read new code for excluding categories (T398861 T398939), [[gerrit:1167569|Fix categorylinks read new code for excluding categories (T3988}}
  • 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78836 and previous config saved to /var/cache/conftool/dbconfig/20250709-135923-root.json
  • 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 13:57 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78835 and previous config saved to /var/cache/conftool/dbconfig/20250709-135732-root.json
  • 13:57 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 13:57 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 13:57 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:56 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 13:55 zabe@deploy1003: zabe: Continuing with sync
  • 13:55 sukhe@dns1004: END - running authdns-update
  • 13:55 sukhe@dns1004: START - running authdns-update
  • 13:54 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7002.wikimedia.org,service=authdns-update [reason: testing alert]
  • 13:54 zabe@deploy1003: zabe: Backport for ApiQueryCategoryMembers: Try stop forcing index in read new code (T399037), ApiQueryCategoryMembers: Try stop forcing index in read new code (T399037), Fix categorylinks read new code for excluding categories (T398861 T398939), Fix categorylinks read new code for excluding categories (T398861 T398939) synced
  • 13:54 hnowlan: delete three wedged thumbor pods showing signs of T374350
  • 13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 13:53 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=helm-charts.*,name=eqiad
  • 13:53 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2004-dev.codfw.wmnet with OS bookworm
  • 13:53 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum1001.eqiad.wmnet
  • {{safesubst:SAL entry|1=13:52 zabe@deploy1003: Started scap sync-world: Backport for ApiQueryCategoryMembers: Try stop forcing index in read new code (T399037), ApiQueryCategoryMembers: Try stop forcing index in read new code (T399037), Fix categorylinks read new code for excluding categories (T398861 T398939), [[gerrit:1167569|Fix categorylinks read new code for excluding categories (T39886}}
  • 13:51 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host chartmuseum1001.eqiad.wmnet
  • 13:50 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=helm-charts.*,name=eqiad
  • 13:50 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7002.wikimedia.org,service=authdns-update [reason: testing alert]
  • 13:50 claime: Depooling chartmuseum in eqiad
  • 13:50 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=helm-charts.*,name=codfw
  • 13:50 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum2001.codfw.wmnet
  • 13:49 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host es1044.eqiad.wmnet
  • 13:46 vgutierrez: deploy measure/measure-goog certs in the upload CDN cluster - T394484
  • 13:46 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host chartmuseum2001.codfw.wmnet
  • 13:46 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=helm-charts.*,name=codfw
  • 13:45 claime: Depooling chartmuseum in codfw
  • 13:45 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:45 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:44 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:44 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es1041.eqiad.wmnet
  • 13:42 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1003.eqiad.wmnet
  • 13:41 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.9/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki test2wiki --exceptions countryExceptionMappings.csv
  • 13:40 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.9/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki testwiki --exceptions countryExceptionMappings.csv
  • 13:39 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.9/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki officewiki --exceptions countryExceptionMappings.csv
  • 13:38 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk1003.eqiad.wmnet
  • 13:38 marostegui@cumin1002: START - Cookbook sre.hosts.reboot-single for host es1044.eqiad.wmnet
  • 13:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1044 for upgrade', diff saved to https://phabricator.wikimedia.org/P78829 and previous config saved to /var/cache/conftool/dbconfig/20250709-133639-marostegui.json
  • 13:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1044.eqiad.wmnet with reason: Maintenance
  • 13:36 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.9/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki metawiki --exceptions countryExceptionMappings.csv
  • 13:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1002.eqiad.wmnet
  • 13:31 marostegui@cumin1002: START - Cookbook sre.hosts.reboot-single for host es1041.eqiad.wmnet
  • 13:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1041.eqiad.wmnet with reason: Maintenance
  • 13:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es1041.eqiad.wmnet
  • 13:30 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk1002.eqiad.wmnet
  • 13:26 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp - 2.8.15 upgrade (T398720)
  • 13:25 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp - 2.8.15 upgrade (T398720)
  • 13:24 marostegui@cumin1002: START - Cookbook sre.hosts.reboot-single for host es1041.eqiad.wmnet
  • 13:21 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1041.eqiad.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1041', diff saved to https://phabricator.wikimedia.org/P78828 and previous config saved to /var/cache/conftool/dbconfig/20250709-132111-marostegui.json
  • 13:20 sgimeno@deploy1003: Finished scap sync-world: Backport for Add new script to update old freetext country data new schema (T397270), Growth: Enable limiting Add Link for dewiki (T396382) (duration: 10m 07s)
  • 13:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:15 sgimeno@deploy1003: mhorsey, sgimeno, migr: Continuing with sync
  • 13:12 sgimeno@deploy1003: mhorsey, sgimeno, migr: Backport for Add new script to update old freetext country data new schema (T397270), Growth: Enable limiting Add Link for dewiki (T396382) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:11 brouberol@cumin1003: END (ERROR) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=97) rolling restart_daemons on A:kafka-test-eqiad
  • 13:10 sgimeno@deploy1003: Started scap sync-world: Backport for Add new script to update old freetext country data new schema (T397270), Growth: Enable limiting Add Link for dewiki (T396382)
  • 13:09 brouberol@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
  • 13:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 12:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 12:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1001.eqiad.wmnet
  • 12:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1049.eqiad.wmnet with OS bullseye
  • 12:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 12:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:54 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk1001.eqiad.wmnet
  • 12:54 moritzm: installing jetty9 security updates
  • 12:50 hashar@deploy1003: Finished deploy [gerrit/gerrit@9666238]: Add readonly plugin - T387833 (duration: 00m 10s)
  • 12:50 hashar@deploy1003: Started deploy [gerrit/gerrit@9666238]: Add readonly plugin - T387833
  • 12:48 hashar@deploy1003: Finished deploy [gerrit/gerrit@9666238]: Add readonly plugin - T387833 (duration: 00m 11s)
  • 12:48 hashar@deploy1003: Started deploy [gerrit/gerrit@9666238]: Add readonly plugin - T387833
  • 12:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1051
  • 12:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1051
  • 12:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 12:39 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1051
  • 12:39 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1051
  • 12:39 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1050
  • 12:39 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1050
  • 12:39 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp - 2.8.15 upgrade (T398720)
  • 12:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 12:37 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp - 2.8.15 upgrade (T398720)
  • 12:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 12:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 12:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp[4052].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 12:25 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp[4052].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 12:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1049.eqiad.wmnet with OS bullseye
  • 12:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 12:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1049.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:14 moritzm: installing openjdk-17 security updates
  • 12:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:09 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 12:08 brouberol@cumin1003: END (ERROR) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=97) rolling restart_daemons on A:kafka-jumbo-eqiad
  • 12:08 moritzm: installing nginx security updates
  • 12:07 brouberol@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
  • 12:07 brouberol@cumin1003: END (FAIL) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=1) rolling restart_daemons on A:kafka-jumbo-eqiad
  • 12:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 12:02 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 12:02 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 12:02 brouberol@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
  • 12:01 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 11:57 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:57 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns cloudcephosd1048,49 - jclark@cumin1002"
  • 11:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns cloudcephosd1048,49 - jclark@cumin1002"
  • 11:57 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:56 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:56 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:56 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:55 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:54 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 11:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 11:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 11:52 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 11:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 11:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1049.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:50 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:49 cgoubert@deploy1003: Finished scap sync-world: Backport for PS.php: Restore poolcounter config post-reboot (T395240) (duration: 08m 39s)
  • 11:49 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2003.codfw.wmnet
  • 11:48 fabfur: puppet enabled again on A:cp (T399071)
  • 11:45 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
  • 11:44 cgoubert@deploy1003: cgoubert: Continuing with sync
  • 11:43 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk2003.codfw.wmnet
  • 11:43 cgoubert@deploy1003: cgoubert: Backport for PS.php: Restore poolcounter config post-reboot (T395240) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:42 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1073
  • 11:41 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1073
  • 11:41 cgoubert@deploy1003: Started scap sync-world: Backport for PS.php: Restore poolcounter config post-reboot (T395240)
  • 11:38 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1006.eqiad.wmnet
  • 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'pool pc1', diff saved to https://phabricator.wikimedia.org/P78826 and previous config saved to /var/cache/conftool/dbconfig/20250709-113831-marostegui.json
  • 11:37 marostegui@cumin1002: dbctl commit (dc=all): 'depool pc1', diff saved to https://phabricator.wikimedia.org/P78824 and previous config saved to /var/cache/conftool/dbconfig/20250709-113717-marostegui.json
  • 11:37 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2002.codfw.wmnet
  • 11:35 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2005.codfw.wmnet
  • 11:34 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host poolcounter1006.eqiad.wmnet
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'pool pc2011', diff saved to https://phabricator.wikimedia.org/P78823 and previous config saved to /var/cache/conftool/dbconfig/20250709-113413-marostegui.json
  • 11:33 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'depool pc2011', diff saved to https://phabricator.wikimedia.org/P78821 and previous config saved to /var/cache/conftool/dbconfig/20250709-113322-marostegui.json
  • 11:33 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk2002.codfw.wmnet
  • 11:32 slyngshede@cumin1003: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on P{cp[4052].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 11:32 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:31 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host poolcounter2005.codfw.wmnet
  • 11:31 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp[4052].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 11:29 cgoubert@deploy1003: Finished scap sync-world: Backport for PS.php: Disable primary poolcounters for reboot (T395240) (duration: 08m 19s)
  • 11:28 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:28 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:24 cgoubert@deploy1003: cgoubert: Continuing with sync
  • 11:23 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
  • 11:23 cgoubert@deploy1003: cgoubert: Backport for PS.php: Disable primary poolcounters for reboot (T395240) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:21 cgoubert@deploy1003: Started scap sync-world: Backport for PS.php: Disable primary poolcounters for reboot (T395240)
  • 11:14 slyngshede@cumin1003: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-ulsfo and not P{cp[4037,4045].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 11:13 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1007.eqiad.wmnet
  • 11:10 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host poolcounter1007.eqiad.wmnet
  • 11:09 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2006.codfw.wmnet
  • 11:09 fabfur: disable puppet on A:cp to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/1167530
  • 11:06 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host poolcounter2006.codfw.wmnet
  • 11:05 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 11:02 cgoubert@deploy1003: Finished scap sync-world: Backport for PS.php: Disable secondary poolcounters for reboot (T395240) (duration: 09m 30s)
  • 10:59 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 10:55 cgoubert@deploy1003: cgoubert: Continuing with sync
  • 10:54 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:54 cgoubert@deploy1003: cgoubert: Backport for PS.php: Disable secondary poolcounters for reboot (T395240) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:53 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@3a0cdd4]: bump image suggestions to v1.8.0 (duration: 00m 48s)
  • 10:52 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@3a0cdd4]: bump image suggestions to v1.8.0
  • 10:52 cgoubert@deploy1003: Started scap sync-world: Backport for PS.php: Disable secondary poolcounters for reboot (T395240)
  • 10:51 elukey@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:51 elukey@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:49 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:39 btullis@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cephosd1001.eqiad.wmnet
  • 10:38 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cephosd1001.eqiad.wmnet
  • 10:38 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 10:37 claime: Restoring memory limits on mw-cron - T395436 - T395465
  • 10:36 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 10:30 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2001.codfw.wmnet
  • 10:24 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2001.codfw.wmnet
  • 10:20 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk2001.codfw.wmnet
  • 10:14 claime: Cutting off access to mwmaint servers - T397017
  • 10:13 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host cephosd2001.codfw.wmnet
  • 10:13 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cephosd2001.codfw.wmnet
  • 10:06 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-ulsfo and not P{cp[4037,4045].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 10:04 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:04 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 10:03 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:58 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp[4037,4045].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 09:56 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:48 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp[4037,4045].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 09:45 moritzm: installing Zookeeper security updates on zk-flink
  • 09:23 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host cephosd2001.codfw.wmnet
  • 09:21 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:19 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver1001.eqiad.wmnet
  • 09:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetserver1001.eqiad.wmnet
  • 09:01 slyngs: Upgrade completed Netbox v4.0.11 T397300
  • 08:42 slyngshede@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.11 to production - slyngshede@cumin1003
  • 08:35 btullis@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cephosd2001.codfw.wmnet
  • 08:34 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cephosd2001.codfw.wmnet
  • 08:29 slyngshede@cumin1003: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.11 to production - slyngshede@cumin1003
  • 08:28 slyngshede@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.11 to production - slyngshede@cumin1002
  • 08:21 slyngshede@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.11 to production - slyngshede@cumin1002
  • 08:20 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.9 refs T392179
  • 08:18 slyngs: Deploying Netbox v4.0.11 to production T397300
  • 08:17 slyngshede@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 08:17 slyngshede@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 08:09 aklapper@deploy1003: Finished scap sync-world: Backport for Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925) (duration: 08m 21s)
  • 08:04 aklapper@deploy1003: zabe, aklapper: Continuing with sync
  • 08:03 aklapper@deploy1003: zabe, aklapper: Backport for Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:01 aklapper@deploy1003: Started scap sync-world: Backport for Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925)
  • 07:58 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 07:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 07:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 07:50 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 07:50 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 07:42 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99)
  • 07:42 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 07:42 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99)
  • 07:42 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 07:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance
  • 07:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1036', diff saved to https://phabricator.wikimedia.org/P78817 and previous config saved to /var/cache/conftool/dbconfig/20250709-073458-marostegui.json
  • 07:32 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 07:32 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 07:31 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 07:31 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 07:31 kartik@deploy1003: Finished scap sync-world: Backport for CX: Add virtual-cx-shared DatabaseVirtualDomains (T348513) (duration: 25m 21s)
  • 07:31 moritzm: installing nginx security updates
  • 07:26 kartik@deploy1003: kartik, abi: Continuing with sync
  • 07:23 elukey: upload python3-docker-report 0.0.16 to bookworm-wikimedia
  • 07:23 elukey: upload python3-docker-report to bookworm-wikimedia
  • 07:20 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: sync
  • 07:08 kartik@deploy1003: kartik, abi: Backport for CX: Add virtual-cx-shared DatabaseVirtualDomains (T348513) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:05 kartik@deploy1003: Started scap sync-world: Backport for CX: Add virtual-cx-shared DatabaseVirtualDomains (T348513)
  • 07:05 elukey@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: sync
  • 06:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2232].codfw.wmnet,db[1207,1217].eqiad.wmnet with reason: migration to mariadb 10.11
  • 06:36 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:29 marostegui: Failover m3 from db1213 to db1250 - T398818
  • 06:21 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2234].codfw.wmnet,db[1213,1217,1250].eqiad.wmnet with reason: m3 master switchover T398818
  • 06:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2234].codfw.wmnet,db[1213,1217,1250].eqiad.wmnet with reason: m3 master switchover T398818
  • 06:13 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:58 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 04:23 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 04:23 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply

2025-07-08

  • 23:58 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 23:58 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 23:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1048.eqiad.wmnet with OS bookworm
  • 23:43 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 23:43 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 23:34 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 23:19 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1048.eqiad.wmnet with reason: host reimage
  • 23:15 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1048.eqiad.wmnet with reason: host reimage
  • 23:09 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1047.eqiad.wmnet with OS bookworm
  • 23:09 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 23:06 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 22:53 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host es1048.eqiad.wmnet with OS bookworm
  • 22:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1047.eqiad.wmnet with reason: host reimage
  • 22:38 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1047.eqiad.wmnet with reason: host reimage
  • 22:27 zabe@deploy1003: Finished scap sync-world: Backport for Revert "Enable categorylinks read new on a few large wikis" (duration: 08m 38s)
  • 22:21 zabe@deploy1003: zabe: Continuing with sync
  • 22:20 zabe@deploy1003: zabe: Backport for Revert "Enable categorylinks read new on a few large wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:18 zabe@deploy1003: Started scap sync-world: Backport for Revert "Enable categorylinks read new on a few large wikis"
  • 22:18 zabe@deploy1003: Finished scap sync-world: Backport for Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925) (duration: 08m 33s)
  • 22:16 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host es1047.eqiad.wmnet with OS bookworm
  • 22:13 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 22:13 zabe@deploy1003: zabe: Continuing with sync
  • 22:12 zabe@deploy1003: zabe: Backport for Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:09 zabe@deploy1003: Started scap sync-world: Backport for Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925)
  • 22:08 zabe@deploy1003: Finished scap sync-world: Backport for Enable categorylinks read new on a few large wikis (T397912) (duration: 08m 19s)
  • 22:03 zabe@deploy1003: zabe: Continuing with sync
  • 22:02 zabe@deploy1003: zabe: Backport for Enable categorylinks read new on a few large wikis (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:01 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:00 zabe@deploy1003: Started scap sync-world: Backport for Enable categorylinks read new on a few large wikis (T397912)
  • 22:00 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:56 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2005-dev.codfw.wmnet with OS bookworm
  • 21:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:51 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 21:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 21:47 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1048
  • 21:47 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1048
  • 21:47 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1049
  • 21:47 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1049
  • 21:45 vriley@cumin1002: START - Cookbook sre.hosts.provision for host es1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:44 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:41 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:40 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:38 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1048
  • 21:38 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1048
  • 21:38 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1049
  • 21:38 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1049
  • 21:38 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2005-dev.codfw.wmnet with reason: host reimage
  • 21:38 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephosd1051
  • 21:38 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1051
  • 21:37 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cloudcephosd1048,49 - jclark@cumin1002"
  • 21:37 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cloudcephosd1048,49 - jclark@cumin1002"
  • 21:34 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:33 vriley@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:33 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2005-dev.codfw.wmnet with reason: host reimage
  • 21:31 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:31 vriley@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:31 vriley@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1048 - vriley@cumin1002"
  • 21:28 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1048 - vriley@cumin1002"
  • 21:27 vriley@cumin1002: START - Cookbook sre.hosts.provision for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:26 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1047
  • 21:24 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host es1047
  • 21:24 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:23 vriley@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:21 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:20 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:20 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1047 - vriley@cumin1002"
  • 21:20 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1047 - vriley@cumin1002"
  • 21:16 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:16 vriley@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:13 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:13 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2005-dev.codfw.wmnet with OS bookworm
  • 21:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:15 sbassett: Deployed security mitigation update for T395468
  • 19:43 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 19:42 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 19:42 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 19:41 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 19:41 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 19:40 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 18:59 bking@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for wdqs2022.codfw.wmnet: Renew puppet certificate - bking@cumin1002
  • 18:39 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-eqsin
  • 18:34 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-codfw
  • 18:28 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@52ec646]: T394526 (duration: 01m 35s)
  • 18:26 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@52ec646]: T394526
  • 18:14 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-eqsin
  • 18:11 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2006-dev.codfw.wmnet with OS bookworm
  • 18:09 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-codfw
  • 18:07 btullis@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-ctrl2002.codfw.wmnet
  • 18:07 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-ctrl2002.codfw.wmnet with OS bookworm
  • 17:58 btullis@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-ctrl2001.codfw.wmnet
  • 17:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-ctrl2001.codfw.wmnet with OS bookworm
  • 17:53 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2006-dev.codfw.wmnet with reason: host reimage
  • 17:53 ebernhardson@deploy1003: Finished deploy [airflow-dags/search@5c0689d]: sync rdf-spark-tools 0.3.158 artifacts (duration: 00m 19s)
  • 17:52 ebernhardson@deploy1003: Started deploy [airflow-dags/search@5c0689d]: sync rdf-spark-tools 0.3.158 artifacts
  • 17:50 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2006-dev.codfw.wmnet with reason: host reimage
  • 17:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl2002.codfw.wmnet with reason: host reimage
  • 17:43 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl2002.codfw.wmnet with reason: host reimage
  • 17:39 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl2001.codfw.wmnet with reason: host reimage
  • 17:33 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl2001.codfw.wmnet with reason: host reimage
  • 17:30 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2006-dev.codfw.wmnet with OS bookworm
  • 17:25 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-ctrl2002.codfw.wmnet with OS bookworm
  • 17:25 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-ctrl2002.codfw.wmnet - btullis@cumin1003"
  • 17:21 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 17:19 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-ctrl2002.codfw.wmnet - btullis@cumin1003"
  • 17:18 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl2002.codfw.wmnet on all recursors
  • 17:18 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl2002.codfw.wmnet on all recursors
  • 17:18 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:18 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-ctrl2002.codfw.wmnet - btullis@cumin1003"
  • 17:18 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-ctrl2002.codfw.wmnet - btullis@cumin1003"
  • 17:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-ctrl2001.codfw.wmnet with OS bookworm
  • 17:13 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-ctrl2001.codfw.wmnet - btullis@cumin1003"
  • 17:13 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-ctrl2001.codfw.wmnet - btullis@cumin1003"
  • 17:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:13 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl2001.codfw.wmnet on all recursors
  • 17:13 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl2001.codfw.wmnet on all recursors
  • 17:13 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-ctrl2001.codfw.wmnet - btullis@cumin1003"
  • 17:13 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 17:10 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-ctrl2001.codfw.wmnet - btullis@cumin1003"
  • 17:10 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:10 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 17:04 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:04 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:03 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:59 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:58 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 16:53 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 16:52 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 16:48 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-eqiad
  • 16:43 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "feat: reverse deps - cdanis@cumin1002"
  • 16:43 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: reverse deps - cdanis@cumin1002
  • 16:43 cdanis@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: reverse deps - cdanis@cumin1002
  • 16:43 cdanis@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "feat: reverse deps - cdanis@cumin1002"
  • 16:42 dancy@deploy1003: Installation of scap version "4.187.0" completed for 2 hosts
  • 16:41 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 16:40 dancy@deploy1003: Installing scap version "4.187.0" for 2 host(s)
  • 16:39 mszabo@deploy1003: Finished scap sync-world: Backport for Revert "Add user-related link colors to LinkRenderer::getLinkClasses" (T392775 T398714 T398717 T398952), Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer", UpdateMessageJobTest: Read expected transver from latest (T398904) (duration: 09m 10s)
  • 16:37 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "feat: reverse deps - cdanis@cumin1002"
  • 16:37 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: reverse deps - cdanis@cumin1002
  • 16:36 cdanis@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: reverse deps - cdanis@cumin1002
  • 16:36 cdanis@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "feat: reverse deps - cdanis@cumin1002"
  • 16:35 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 16:34 mszabo@deploy1003: tchanders, mszabo: Continuing with sync
  • 16:33 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 16:32 mszabo@deploy1003: tchanders, mszabo: Backport for Revert "Add user-related link colors to LinkRenderer::getLinkClasses" (T392775 T398714 T398717 T398952), Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer", UpdateMessageJobTest: Read expected transver from latest (T398904) synced to the testservers (see https://wikitech.wikimedia.org
  • 16:31 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 16:30 mszabo@deploy1003: Started scap sync-world: Backport for Revert "Add user-related link colors to LinkRenderer::getLinkClasses" (T392775 T398714 T398717 T398952), Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer", UpdateMessageJobTest: Read expected transver from latest (T398904)
  • 16:23 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2003.codfw.wmnet with OS bookworm
  • 16:23 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-eqiad
  • 16:22 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-drmrs
  • 16:21 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin and not P{cp[5017,5025].eqsin.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 16:12 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 16:11 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 16:07 btullis@cumin1003: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl2002.codfw.wmnet
  • 16:07 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 16:06 btullis@cumin1003: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl2001.codfw.wmnet
  • 15:57 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-drmrs
  • 15:48 btullis@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd2002.codfw.wmnet
  • 15:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd2002.codfw.wmnet with OS bookworm
  • 15:44 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-esams
  • 15:38 btullis@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd2003.codfw.wmnet
  • 15:38 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd2003.codfw.wmnet with OS bookworm
  • 15:31 bvibber@deploy1003: Finished scap sync-world: Backport for Support null values in data columns in transform output (T398597) (duration: 08m 52s)
  • 15:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd2002.codfw.wmnet with reason: host reimage
  • 15:27 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd2002.codfw.wmnet with reason: host reimage
  • 15:25 bvibber@deploy1003: bvibber: Continuing with sync
  • 15:24 bvibber@deploy1003: bvibber: Backport for Support null values in data columns in transform output (T398597) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:22 bvibber@deploy1003: Started scap sync-world: Backport for Support null values in data columns in transform output (T398597)
  • 15:21 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd2003.codfw.wmnet with reason: host reimage
  • 15:20 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-ulsfo
  • 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78815 and previous config saved to /var/cache/conftool/dbconfig/20250708-151939-root.json
  • 15:19 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-esams
  • 15:18 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-magru
  • 15:18 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd2003.codfw.wmnet with reason: host reimage
  • 15:12 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-etcd2002.codfw.wmnet with OS bookworm
  • 15:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78814 and previous config saved to /var/cache/conftool/dbconfig/20250708-150434-root.json
  • 15:02 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2002.codfw.wmnet with OS bookworm
  • 14:57 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2002.codfw.wmnet - btullis@cumin1003"
  • 14:55 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-ulsfo
  • 14:53 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-magru
  • 14:53 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2002.codfw.wmnet - btullis@cumin1003"
  • 14:53 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd2002.codfw.wmnet on all recursors
  • 14:53 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd2002.codfw.wmnet on all recursors
  • 14:53 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:52 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-etcd2003.codfw.wmnet with OS bookworm
  • 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2003.codfw.wmnet - btullis@cumin1003"
  • 14:50 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2003.codfw.wmnet - btullis@cumin1003"
  • 14:50 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd2003.codfw.wmnet on all recursors
  • 14:50 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd2003.codfw.wmnet on all recursors
  • 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-etcd2003.codfw.wmnet - btullis@cumin1003"
  • 14:50 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-etcd2003.codfw.wmnet - btullis@cumin1003"
  • 14:50 pmiazga: Ran fixStuckGlobalRename.php for T398837
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78813 and previous config saved to /var/cache/conftool/dbconfig/20250708-144928-root.json
  • 14:47 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-eqsin and not P{cp[5017,5025].eqsin.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 14:45 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:45 btullis@cumin1003: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd2003.codfw.wmnet
  • 14:41 moritzm: installing shadow security updates
  • 14:39 btullis@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd2001.codfw.wmnet
  • 14:39 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd2001.codfw.wmnet with OS bookworm
  • 14:36 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2001.codfw.wmnet with OS bookworm
  • 14:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78812 and previous config saved to /var/cache/conftool/dbconfig/20250708-143422-root.json
  • 14:28 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1185 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78811 and previous config saved to /var/cache/conftool/dbconfig/20250708-142635-marostegui.json
  • 14:26 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:26 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:23 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:23 btullis@cumin1003: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd2002.codfw.wmnet
  • 14:21 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd2001.codfw.wmnet with reason: host reimage
  • 14:18 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd2001.codfw.wmnet with reason: host reimage
  • 13:53 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2002)
  • 13:53 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2002)
  • 13:50 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 13:50 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 13:40 moritzm: installing werkzeug security updates
  • 13:27 cmooney@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:27 cmooney@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new ML mega-hosts in eqiad - cmooney@cumin2002"
  • 13:23 cmooney@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new ML mega-hosts in eqiad - cmooney@cumin2002"
  • 13:20 cmooney@cumin2002: START - Cookbook sre.dns.netbox
  • 13:20 moritzm: restart clamav on VRTS to pick up ICU security updates
  • 13:18 moritzm: restarting Postfix on mx* and crm2001 to pick up ICU security updates
  • 13:17 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Do not log rate-limiting rules if it wouldn\'t be applied - oblivian@cumin1003"
  • 13:17 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Do not log rate-limiting rules if it wouldn\'t be applied - oblivian@cumin1003
  • 13:17 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Do not log rate-limiting rules if it wouldn\'t be applied - oblivian@cumin1003
  • 13:17 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Do not log rate-limiting rules if it wouldn\'t be applied - oblivian@cumin1003"
  • 13:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-etcd2001.codfw.wmnet with OS bookworm
  • 13:04 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbprov2005.codfw.wmnet,dbprov1005.eqiad.wmnet with reason: MariaDB package update
  • 12:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
  • 12:56 moritzm: installing ICU security updates on Bookworm
  • 12:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
  • 12:56 moritzm: installing ICU security updates
  • 12:55 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2001.codfw.wmnet - btullis@cumin1003"
  • 12:55 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2001.codfw.wmnet - btullis@cumin1003"
  • 12:54 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd2001.codfw.wmnet on all recursors
  • 12:54 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd2001.codfw.wmnet on all recursors
  • 12:54 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:54 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-etcd2001.codfw.wmnet - btullis@cumin1003"
  • 12:54 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-etcd2001.codfw.wmnet - btullis@cumin1003"
  • 12:54 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
  • 12:52 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
  • 12:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2002.codfw.wmnet
  • 12:51 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:50 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
  • 12:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:49 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:49 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1001.eqiad.wmnet
  • 12:49 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:48 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
  • 12:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2002.codfw.wmnet
  • 12:46 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 12:46 btullis@cumin1003: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd2001.codfw.wmnet
  • 12:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2001.codfw.wmnet
  • 12:44 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host moss-be1001.eqiad.wmnet
  • 12:43 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be1004.eqiad.wmnet
  • 12:40 moritzm: installing commons-beanutils security updates
  • 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2001.codfw.wmnet
  • 12:39 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host apus-be1004.eqiad.wmnet
  • 12:38 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1003.eqiad.wmnet
  • 12:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be2004.codfw.wmnet
  • 12:32 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cephosd2002
  • 12:32 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cephosd2002
  • 12:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host apus-be2004.codfw.wmnet
  • 12:31 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host moss-be1003.eqiad.wmnet
  • 12:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
  • 12:30 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cephosd2002
  • 12:30 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cephosd2002.codfw.wmnet 235.32.192.10.in-addr.arpa 5.3.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:30 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache cephosd2002.codfw.wmnet 235.32.192.10.in-addr.arpa 5.3.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:30 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:30 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1002.eqiad.wmnet
  • 12:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cephosd2003
  • 12:28 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cephosd2003
  • 12:28 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 12:27 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cephosd2003
  • 12:27 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cephosd2003.codfw.wmnet 240.48.192.10.in-addr.arpa 0.4.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:27 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache cephosd2003.codfw.wmnet 240.48.192.10.in-addr.arpa 0.4.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:27 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:26 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cephosd2001
  • 12:26 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cephosd2001
  • 12:26 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:26 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:26 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
  • 12:25 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host moss-be1002.eqiad.wmnet
  • 12:25 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 12:24 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cephosd2001
  • 12:24 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cephosd2001.codfw.wmnet 133.0.192.10.in-addr.arpa 3.3.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:24 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache cephosd2001.codfw.wmnet 133.0.192.10.in-addr.arpa 3.3.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:24 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:24 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cephosd2001 - btullis@cumin1003"
  • 12:24 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cephosd2001 - btullis@cumin1003"
  • 12:21 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host cephosd2003
  • 12:21 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 12:21 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cephosd2003.codfw.wmnet with OS bookworm
  • 12:21 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host cephosd2002
  • 12:20 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bookworm
  • 12:20 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host cephosd2001
  • 12:20 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bookworm
  • 12:12 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 12:12 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 12:10 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 12:10 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 12:09 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 12:08 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 12:06 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 12:06 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas-eqiad
  • 12:03 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas-eqiad
  • 12:02 jmm@cumin1002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-eqiad
  • 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas-codfw
  • 12:00 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 12:00 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 12:00 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas-codfw
  • 11:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet
  • 11:54 btullis@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cephosd[2001-2003].codfw.wmnet with reason: Bootstrapping new ceph cluster
  • 11:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:52 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1002.eqiad.wmnet
  • 11:52 moritzm: restarting FPM on Phabricator nodes to pick up OpenSSL updates
  • 11:49 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet
  • 11:49 moritzm: restarting exim on Phabricator nodes to pick up OpenSSL updates
  • 11:44 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
  • 11:42 jmm@cumin1002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-eqiad
  • 11:39 jynus: upgrade db2201 mariadb package T394487
  • 11:37 jmm@cumin1002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:relforge
  • 11:36 jmm@cumin1002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:relforge
  • 11:35 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet
  • 11:35 hashar: Restarted Apache on gerrit1003 and gerrit2002
  • 11:31 zabe@deploy1003: Finished scap sync-world: Backport for Remove redundant group0 config for categorylinks, Set categorylinks to read new in cebwiki (T397912) (duration: 09m 35s)
  • 11:29 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
  • 11:29 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
  • 11:27 moritzm: restarting apache on mirror1001 to pick up openssl sec updates
  • 11:27 jmm@cumin1002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-codfw
  • 11:25 zabe@deploy1003: zabe: Continuing with sync
  • 11:24 zabe@deploy1003: zabe: Backport for Remove redundant group0 config for categorylinks, Set categorylinks to read new in cebwiki (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:24 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet
  • 11:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1159 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78807 and previous config saved to /var/cache/conftool/dbconfig/20250708-112344-root.json
  • 11:22 zabe@deploy1003: Started scap sync-world: Backport for Remove redundant group0 config for categorylinks, Set categorylinks to read new in cebwiki (T397912)
  • 11:20 jynus: upgrade db1216 mariadb package T394487
  • 11:15 moritzm: restarting slapd on seaborgium/serpens to pick up OpenSSL updates
  • 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 11:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 11:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1159 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78806 and previous config saved to /var/cache/conftool/dbconfig/20250708-110838-root.json
  • 11:07 jmm@cumin1002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-codfw
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78805 and previous config saved to /var/cache/conftool/dbconfig/20250708-110656-root.json
  • 11:06 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 11:06 jmm@cumin1002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:cloudelastic
  • 11:04 jmm@cumin1002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:cloudelastic
  • 11:03 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2201.codfw.wmnet,db1216.eqiad.wmnet with reason: MariaDB package update
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 11:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1003.eqiad.wmnet
  • 10:56 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host matomo1003.eqiad.wmnet
  • 10:56 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 10:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-cluster
  • 10:54 Emperor: reboot apus frontends in codfw T395240
  • 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 10:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1159 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78803 and previous config saved to /var/cache/conftool/dbconfig/20250708-105332-root.json
  • 10:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for Fully get rid of tracking and updating pages (T398033), api-testing: Loosen the assert on max-age header, Fully get rid of tracking and updating pages (T398033) (duration: 09m 33s)
  • 10:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78802 and previous config saved to /var/cache/conftool/dbconfig/20250708-105151-root.json
  • 10:49 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1005.eqiad.wmnet
  • 10:47 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 10:45 ladsgroup@deploy1003: ladsgroup: Backport for Fully get rid of tracking and updating pages (T398033), api-testing: Loosen the assert on max-age header, Fully get rid of tracking and updating pages (T398033) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:44 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-conf1005.eqiad.wmnet
  • 10:42 ladsgroup@deploy1003: Started scap sync-world: Backport for Fully get rid of tracking and updating pages (T398033), api-testing: Loosen the assert on max-age header, Fully get rid of tracking and updating pages (T398033)
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1159 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78801 and previous config saved to /var/cache/conftool/dbconfig/20250708-103826-root.json
  • 10:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-cluster
  • 10:37 Emperor: reboot apus frontends in eqiad T395240
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78800 and previous config saved to /var/cache/conftool/dbconfig/20250708-103645-root.json
  • 10:34 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1159 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78799 and previous config saved to /var/cache/conftool/dbconfig/20250708-103106-marostegui.json
  • 10:31 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 10:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1159 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78798 and previous config saved to /var/cache/conftool/dbconfig/20250708-102746-root.json
  • 10:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1004.eqiad.wmnet
  • 10:21 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-conf1004.eqiad.wmnet
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78797 and previous config saved to /var/cache/conftool/dbconfig/20250708-102140-root.json
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1159 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78796 and previous config saved to /var/cache/conftool/dbconfig/20250708-102114-marostegui.json
  • 10:21 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 10:20 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 10:14 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 10:14 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
  • 10:12 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:11 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
  • 10:09 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
  • 10:07 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
  • 10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2157', diff saved to https://phabricator.wikimedia.org/P78795 and previous config saved to /var/cache/conftool/dbconfig/20250708-100434-marostegui.json
  • 10:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 09:53 Amir1: dropping term store tables on s8 sanitarium master (T351820)
  • 09:51 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 09:51 moritzm: installling openssl security updates on Bullseye
  • 09:51 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:51 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 09:50 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:41 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 09:41 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:15 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.9 refs T392179
  • 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema-eqiad
  • 09:04 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema-eqiad
  • 08:59 aklapper@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.9 refs T392179 (duration: 43m 18s)
  • 08:52 moritzm: installing nginx security updates
  • 08:48 moritzm: installing Redis security updates
  • 08:30 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 08:30 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 08:30 moritzm: created a stub user "bumpuid" to move the allocation of UIDs for accounted created in Wikimedia IDM to 100000+ T355663
  • 08:30 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 08:28 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 08:26 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 08:26 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 08:16 aklapper@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.9 refs T392179
  • 08:11 moritzm: installing postgresql-15 security updates
  • 08:11 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:11 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:06 gmodena@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:06 gmodena@deploy1003: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:02 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:01 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 07:55 fabfur: enabling puppet on A:cp (T329332)
  • 07:54 marostegui: Migrate s3 eqiad to SBR T383795
  • 07:45 fabfur: temporary disable puppet on A:cp to apply https://gerrit.wikimedia.org/r/1135643 (T329332)
  • 07:42 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 07:42 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 07:30 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 07:19 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 07:14 tchanders@deploy1003: Finished scap sync-world: Backport for temp accounts: Separate digits in user names with hyphens (T381845) (duration: 11m 02s)
  • 07:09 tchanders@deploy1003: tchanders: Continuing with sync
  • 07:05 tchanders@deploy1003: tchanders: Backport for temp accounts: Separate digits in user names with hyphens (T381845) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:03 tchanders@deploy1003: Started scap sync-world: Backport for temp accounts: Separate digits in user names with hyphens (T381845)
  • 06:35 moritzm: rebalance following reimages T382513
  • 06:31 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Revert - oblivian@cumin1003"
  • 06:31 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Revert - oblivian@cumin1003
  • 06:30 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Revert - oblivian@cumin1003
  • 06:30 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Revert - oblivian@cumin1003"
  • 06:15 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix varnis logging (take 2) - oblivian@cumin1003"
  • 06:14 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix varnis logging (take 2) - oblivian@cumin1003
  • 06:14 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix varnis logging (take 2) - oblivian@cumin1003
  • 06:14 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix varnis logging (take 2) - oblivian@cumin1003"
  • 05:52 marostegui: Migrate s3 codfw to SBR T383795
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78792 and previous config saved to /var/cache/conftool/dbconfig/20250708-054825-root.json
  • 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78791 and previous config saved to /var/cache/conftool/dbconfig/20250708-054329-root.json
  • 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Reverty - oblivian@cumin1003"
  • 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Reverty - oblivian@cumin1003
  • 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Reverty - oblivian@cumin1003
  • 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Reverty - oblivian@cumin1003"
  • 05:42 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Reverty - oblivian@cumin1003"
  • 05:42 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Reverty - oblivian@cumin1003
  • 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Reverty - oblivian@cumin1003
  • 05:41 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Reverty - oblivian@cumin1003"
  • 05:35 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Feature: better logging of varnish rate-limits - oblivian@cumin1003"
  • 05:35 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: better logging of varnish rate-limits - oblivian@cumin1003
  • 05:35 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: better logging of varnish rate-limits - oblivian@cumin1003
  • 05:35 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Feature: better logging of varnish rate-limits - oblivian@cumin1003"
  • 05:33 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2003.wikimedia.org with reason: WIP
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78790 and previous config saved to /var/cache/conftool/dbconfig/20250708-053320-root.json
  • 05:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78789 and previous config saved to /var/cache/conftool/dbconfig/20250708-052823-root.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78788 and previous config saved to /var/cache/conftool/dbconfig/20250708-051814-root.json
  • 05:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78787 and previous config saved to /var/cache/conftool/dbconfig/20250708-051318-root.json
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78786 and previous config saved to /var/cache/conftool/dbconfig/20250708-050308-root.json
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78785 and previous config saved to /var/cache/conftool/dbconfig/20250708-045812-root.json
  • 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P78784 and previous config saved to /var/cache/conftool/dbconfig/20250708-044803-root.json
  • 04:39 marostegui@dns1006: END - running authdns-update
  • 04:38 marostegui@dns1006: START - running authdns-update
  • 04:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1162 T398906', diff saved to https://phabricator.wikimedia.org/P78783 and previous config saved to /var/cache/conftool/dbconfig/20250708-043814-marostegui.json
  • 04:38 marostegui@dns1006: END - running authdns-update
  • 04:37 marostegui@dns1006: START - running authdns-update
  • 04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1222 to s2 primary and set section read-write T398906', diff saved to https://phabricator.wikimedia.org/P78782 and previous config saved to /var/cache/conftool/dbconfig/20250708-043654-root.json
  • 04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T398906', diff saved to https://phabricator.wikimedia.org/P78781 and previous config saved to /var/cache/conftool/dbconfig/20250708-043628-root.json
  • 04:36 marostegui: Starting s2 eqiad failover from db1162 to db1222 - T398906
  • 04:26 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1222 with weight 0 T398906', diff saved to https://phabricator.wikimedia.org/P78780 and previous config saved to /var/cache/conftool/dbconfig/20250708-042646-root.json
  • 04:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 T398906
  • 04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.6 (duration: 04m 24s)
  • 02:42 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 02:23 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 02:20 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 01:59 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 00:21 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 00:20 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply

2025-07-07

  • 22:58 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 22:35 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 22:16 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 22:13 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 21:58 maryum: Deployed security fix for T397577
  • 21:52 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 21:36 maryum: Deployed security fix for T398636
  • 21:11 zabe@deploy1003: Finished scap sync-world: Backport for Straight join collation table to make sure it is last (T398860) (duration: 10m 33s)
  • 21:05 zabe@deploy1003: zabe: Continuing with sync
  • 21:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
  • 21:02 zabe@deploy1003: zabe: Backport for Straight join collation table to make sure it is last (T398860) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:01 ladsgroup@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 21:00 zabe@deploy1003: Started scap sync-world: Backport for Straight join collation table to make sure it is last (T398860)
  • 20:33 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 20:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast6003.wikimedia.org to drbd
  • 20:16 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 20:13 ebernhardson@deploy1003: Finished scap sync-world: Backport for cirrus: Start AB test of completion suggester fuzziness (T397732) (duration: 10m 28s)
  • 20:12 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 20:11 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 20:10 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 20:07 ebernhardson@deploy1003: ebernhardson: Continuing with sync
  • 20:05 ebernhardson@deploy1003: ebernhardson: Backport for cirrus: Start AB test of completion suggester fuzziness (T397732) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:02 ebernhardson@deploy1003: Started scap sync-world: Backport for cirrus: Start AB test of completion suggester fuzziness (T397732)
  • 19:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast6003.wikimedia.org to drbd
  • 19:51 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 19:39 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 19:38 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 19:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1179.eqiad.wmnet with OS bullseye
  • 19:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:31 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1179.eqiad.wmnet with reason: host reimage
  • 19:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1179.eqiad.wmnet with reason: host reimage
  • 18:59 bvibber@deploy1003: Finished scap sync-world: Backport for Fix for validation error display in transformed chart data (T398597) (duration: 08m 40s)
  • 18:58 sukhe: sukhe@cp7006:/var/run/confd-template$ sudo rm _etc_haproxy_conf.d_tls.cfg.err
  • 18:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1179.eqiad.wmnet with OS bullseye
  • 18:55 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 18:54 bvibber@deploy1003: bvibber: Continuing with sync
  • 18:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1179.eqiad.wmnet with OS bullseye
  • 18:53 bvibber@deploy1003: bvibber: Backport for Fix for validation error display in transformed chart data (T398597) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:51 bvibber@deploy1003: Started scap sync-world: Backport for Fix for validation error display in transformed chart data (T398597)
  • 18:40 zabe@deploy1003: Finished scap sync-world: Backport for Revert^2 "Set categorylinks to read new in medium wikis" (T397912) (duration: 09m 54s)
  • 18:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1179.eqiad.wmnet with OS bullseye
  • 18:35 zabe@deploy1003: zabe: Continuing with sync
  • 18:32 zabe@deploy1003: zabe: Backport for Revert^2 "Set categorylinks to read new in medium wikis" (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:31 zabe@deploy1003: Started scap sync-world: Backport for Revert^2 "Set categorylinks to read new in medium wikis" (T397912)
  • 18:12 zabe@deploy1003: Finished scap sync-world: Backport for Apply conditions to correct column (T398823) (duration: 11m 14s)
  • 18:10 urandom: bootstrapping Cassandra/sessionstore1006-a — T391544
  • 18:09 sukhe@dns1004: END - running authdns-update
  • 18:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
  • 18:08 sukhe@dns1004: START - running authdns-update
  • 18:06 zabe@deploy1003: zabe: Continuing with sync
  • 18:04 sukhe: [end] rolling upgrade of haproxy on A:dnsbox to 2.6.12-1+deb12u2
  • 18:04 sukhe: [emd] rolling upgrade of haproxy on A:dnsbox to 2.6.12-1+deb12u2
  • 18:03 zabe@deploy1003: zabe: Backport for Apply conditions to correct column (T398823) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:00 zabe@deploy1003: Started scap sync-world: Backport for Apply conditions to correct column (T398823)
  • 17:58 bking@cumin1002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 17:58 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 17:58 ladsgroup@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
  • 17:58 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitarium_restart (exit_code=99)
  • 17:57 ladsgroup@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
  • 17:45 sukhe: [start] rolling upgrade of haproxy on A:dnsbox to 2.6.12-1+deb12u2
  • 17:40 bking@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=search-omega*,name=eqiad
  • 17:40 bking@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=search-psi*,name=eqiad
  • 17:40 bking@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=search*,name=eqiad
  • 17:35 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 17:12 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 17:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
  • 17:07 bking@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=search-psi*,name=eqiad
  • 17:07 bking@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=search-omega*,name=eqiad
  • 17:06 bking@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=search*,name=eqiad
  • 17:05 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
  • 16:49 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 16:49 taavi@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2003-dev.codfw.wmnet
  • 16:49 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 16:43 taavi@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudgw2003-dev.codfw.wmnet
  • 16:38 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 16:20 eevans@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 16:16 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 16:15 bking@cumin1002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 16:08 elukey: kafka preferred-replica-election on kafka1011 to rebalance partition leaders on kafka-jumbo
  • 16:04 elukey: restart kafka on kafka1015 (forth and last node without restart in the previous cookbook run)
  • 16:02 elukey: restart kafka on kafka1014 (second node without restart in the previous cookbook run)
  • 16:00 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 15:59 elukey: restart kafka on kafka1013 (second node without restart in the previous cookbook run)
  • 15:56 elukey: restart kafka on kafka1012 (first node without restart in the previous cookbook run)
  • 15:55 elukey: kafka-preferred-replica on kafka-jumbo
  • 15:46 moritzm: installing busybox updates from Bookworm point release
  • 15:37 moritzm: installing zsh updates from Bookworm point release
  • 15:33 moritzm: installing postgresql security updates
  • 15:28 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema-codfw
  • 15:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thanos-be[2001-2004].codfw.wmnet
  • 15:25 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:25 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-be[2001-2004].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
  • 15:24 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-be[2001-2004].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
  • 15:24 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 15:24 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
  • 15:22 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 15:22 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 15:22 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema-codfw
  • 15:21 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 15:18 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2006.codfw.wmnet
  • 15:18 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:17 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:17 elukey@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:12 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp2006.codfw.wmnet
  • 15:10 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts thanos-be[2001-2004].codfw.wmnet
  • 15:02 brouberol@cumin2002: END (FAIL) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=99) rolling restart_daemons on A:kafka-jumbo-eqiad
  • 15:02 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 15:00 sukhe: sudo cumin -b1 -s120 'A:dnsbox and not P{dns7001*}' "run-puppet-agent --enable 'merging CR 1166223'": T374619
  • 15:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl1002.eqiad.wmnet
  • 14:58 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: [done] testing CR 1166223: T374619]
  • 14:58 vgutierrez: switching lvs3010 to katran - T396561
  • 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl1002.eqiad.wmnet
  • 14:54 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing CR 1166223: T374619]
  • 14:47 sukhe: sudo cumin 'A:dnsbox' "disable-puppet 'merging CR 1166223'": rolling out prom metrics for anycast-hc: T374619
  • 14:46 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
  • 14:42 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
  • 14:41 vgutierrez: switching lvs6003 to katran - T396561
  • 14:39 sukhe: sudo cumin -b1 -s10 'A:wikidough' "run-puppet-agent --enable 'merging CR 1166838'"
  • 14:38 sukhe: sudo cumin -s1 -b10 'A:wikidough' "run-puppet-agent --enable 'merging CR 1166838'"
  • 14:32 sukhe: sudo cumin 'A:wikidough' "disable-puppet 'merging CR 1166838'"
  • 14:28 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 14:26 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 14:25 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 14:22 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 14:22 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 14:18 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
  • 14:14 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 14:10 urandom: decommissioning Cassandra/sessionstore-a — T391544
  • 14:09 sukhe: sudo cumin -b1 -s10 'A:dnsbox' "run-puppet-agent --enable 'merging CR 1166210'"
  • 14:07 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2006-dev.codfw.wmnet with OS bookworm
  • 14:05 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 14:03 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@f79034f]: remove dumps 1.0 sensor from SLIS (duration: 00m 46s)
  • 14:02 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@f79034f]: remove dumps 1.0 sensor from SLIS
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl1003.eqiad.wmnet
  • 13:54 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 13:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl1003.eqiad.wmnet
  • 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl2002.codfw.wmnet
  • 13:51 sukhe: sudo cumin 'A:dnsbox' "disable-puppet 'merging CR 1166210'"
  • 13:49 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2006-dev.codfw.wmnet with reason: host reimage
  • 13:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl2002.codfw.wmnet
  • 13:47 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1069.eqiad.wmnet
  • 13:47 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1069.eqiad.wmnet
  • 13:46 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker1069.eqiad.wmnet
  • 13:46 cgoubert@cumin1003: START - Cookbook sre.hosts.remove-downtime for wikikube-worker1069.eqiad.wmnet
  • 13:45 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2006-dev.codfw.wmnet with reason: host reimage
  • 13:45 zabe@deploy1003: Finished scap sync-world: Backport for Revert "Set categorylinks to read new in medium wikis" (duration: 07m 59s)
  • 13:45 claime: homer "cr*eqiad*" commit 'wikikube-worker1069 back to active'
  • 13:42 cgoubert@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:40 zabe@deploy1003: zabe: Continuing with sync
  • 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl2003.codfw.wmnet
  • 13:39 cgoubert@cumin1003: START - Cookbook sre.dns.netbox
  • 13:39 zabe@deploy1003: zabe: Backport for Revert "Set categorylinks to read new in medium wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:39 cgoubert@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:37 zabe@deploy1003: Started scap sync-world: Backport for Revert "Set categorylinks to read new in medium wikis"
  • 13:36 cgoubert@cumin1003: START - Cookbook sre.dns.netbox
  • 13:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl2003.codfw.wmnet
  • 13:34 sukhe: sudo cumin -b11 'C:bird' "run-puppet-agent --enable 'merging CR 1166222'": NOOP change
  • 13:31 zabe@deploy1003: zabe: Continuing with sync
  • 13:30 zabe@deploy1003: zabe: Backport for Set categorylinks to read new in medium wikis (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:28 zabe@deploy1003: Started scap sync-world: Backport for Set categorylinks to read new in medium wikis (T397912)
  • 13:28 sukhe: sudo cumin 'C:bird' "disable-puppet 'merging CR 1166222'"
  • 13:27 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2006-dev.codfw.wmnet with OS bookworm
  • 13:26 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2005-dev.codfw.wmnet with OS bookworm
  • 13:19 ladsgroup@deploy1003: Finished scap sync-world: Backport for mrwiki: Correct draft namespace spelling (T398792) (duration: 09m 26s)
  • 13:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2146* gradually with 4 steps - Work done
  • 13:13 ladsgroup@deploy1003: ladsgroup, hamishz: Continuing with sync
  • 13:11 ladsgroup@deploy1003: ladsgroup, hamishz: Backport for mrwiki: Correct draft namespace spelling (T398792) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:09 ladsgroup@deploy1003: Started scap sync-world: Backport for mrwiki: Correct draft namespace spelling (T398792)
  • 13:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for Drop ability to use VueTest on a wiki (T357475) (duration: 37m 21s)
  • 13:07 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 13:05 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 12:59 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:59 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:58 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:58 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:57 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2002)
  • 12:57 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2002)
  • 12:55 ladsgroup@deploy1003: ladsgroup, jforrester: Continuing with sync
  • 12:54 ladsgroup@deploy1003: ladsgroup, jforrester: Backport for Drop ability to use VueTest on a wiki (T357475) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:47 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2005-dev.codfw.wmnet with OS bookworm
  • 12:35 akosiaris@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2042.codfw.wmnet
  • 12:35 akosiaris@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2042.codfw.wmnet
  • 12:34 akosiaris@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2046.codfw.wmnet
  • 12:34 akosiaris@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2046.codfw.wmnet
  • 12:32 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2146* gradually with 4 steps - Work done
  • 12:30 ladsgroup@deploy1003: Started scap sync-world: Backport for Drop ability to use VueTest on a wiki (T357475)
  • 12:28 ladsgroup@deploy1003: Finished scap sync-world: Backport for Revert "Increase max db connection count before circuit breaking" (T398692) (duration: 08m 13s)
  • 12:22 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 12:21 ladsgroup@deploy1003: ladsgroup: Backport for Revert "Increase max db connection count before circuit breaking" (T398692) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:19 ladsgroup@deploy1003: Started scap sync-world: Backport for Revert "Increase max db connection count before circuit breaking" (T398692)
  • 12:18 ladsgroup@deploy1003: Finished scap sync-world: Backport for Use dblist for wikilove (duration: 12m 28s)
  • 12:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS trixie
  • 12:10 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6001.drmrs.wmnet to drbd
  • 12:08 ladsgroup@deploy1003: ladsgroup: Backport for Use dblist for wikilove synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:06 ladsgroup@deploy1003: Started scap sync-world: Backport for Use dblist for wikilove
  • 12:04 XioNoX: reboot lsw1-a8-codfw - T398433
  • 12:03 akosiaris@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2046.codfw.wmnet
  • 12:03 akosiaris@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2046.codfw.wmnet
  • 12:02 akosiaris@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2042.codfw.wmnet
  • 12:02 akosiaris@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2042.codfw.wmnet
  • 12:00 ladsgroup@deploy1003: Finished scap sync-world: Backport for Revert^2 "Clean up EventBus and jobs config" (duration: 35m 06s)
  • 12:00 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6001.drmrs.wmnet to drbd
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6001.wikimedia.org to drbd
  • 11:59 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2146.codfw.wmnet with reason: Just in case (T398433)
  • 11:56 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS trixie
  • 11:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db2146 T398433', diff saved to https://phabricator.wikimedia.org/P78771 and previous config saved to /var/cache/conftool/dbconfig/20250707-115457-ladsgroup.json
  • 11:51 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6001.wikimedia.org to drbd
  • 11:47 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 11:46 ladsgroup@deploy1003: ladsgroup: Backport for Revert^2 "Clean up EventBus and jobs config" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:42 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS trixie
  • 11:25 ladsgroup@deploy1003: Started scap sync-world: Backport for Revert^2 "Clean up EventBus and jobs config"
  • 11:10 root@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 11:06 root@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 11:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1250.eqiad.wmnet with reason: Maintenance
  • 11:02 moritzm: installing modsecurity-apache security updates
  • 10:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Maintenance
  • 10:45 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 10:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2234].codfw.wmnet with reason: Maintenance
  • 10:35 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 10:30 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 10:24 Emperor: remove swift-account-stats_machinetranslation:prod time & service from thanos-fe1004 T335491
  • 10:17 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 10:13 root@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 10:09 root@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 09:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 09:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 09:42 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 09:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 09:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1250.eqiad.wmnet with reason: Maintenance
  • 09:18 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 09:13 marostegui: Failover m2 from db1250 to db1228 - T397633
  • 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 09:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2233].codfw.wmnet,db[1217,1228,1250].eqiad.wmnet with reason: maintenance
  • 08:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: Maintenance
  • 08:01 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Feature: logging of deny actions; add rename functionality - oblivian@cumin1003"
  • 08:01 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: logging of deny actions; add rename functionality - oblivian@cumin1003
  • 08:00 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: logging of deny actions; add rename functionality - oblivian@cumin1003
  • 08:00 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Feature: logging of deny actions; add rename functionality - oblivian@cumin1003"
  • 08:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1237.eqiad.wmnet with reason: Maintenance
  • 07:53 vgutierrez: repooling cp7006 with Ia82b93 applied - T397917
  • 07:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1237 T397612', diff saved to https://phabricator.wikimedia.org/P78763 and previous config saved to /var/cache/conftool/dbconfig/20250707-075308-root.json
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1220 to x1 primary and set section read-write T397612', diff saved to https://phabricator.wikimedia.org/P78762 and previous config saved to /var/cache/conftool/dbconfig/20250707-075254-root.json
  • 07:51 marostegui@dns1006: END - running authdns-update
  • 07:50 marostegui@dns1006: START - running authdns-update
  • 07:25 vgutierrez: depooling cp7006 to test Ia82b93 - T397917
  • 07:25 marostegui: Starting x1 eqiad failover from db1237 to db1220 - T397612
  • 07:22 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1220 with weight 0 T397612', diff saved to https://phabricator.wikimedia.org/P78760 and previous config saved to /var/cache/conftool/dbconfig/20250707-072157-root.json
  • 07:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Primary switchover x1 T397612
  • 07:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 07:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 07:04 vgutierrez: testing haproxy 2.8.15 in cp5017 and cp5025 - T398720
  • 06:29 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 06:29 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply

2025-07-04

  • 21:39 krinkle@deploy1003: Finished scap sync-world: Backport for beta: Change loginwiki/metawiki/auth canonical to beta.wmcloud.org (T289318) (duration: 18m 12s)
  • 21:33 krinkle@deploy1003: krinkle: Continuing with sync
  • 21:23 krinkle@deploy1003: krinkle: Backport for beta: Change loginwiki/metawiki/auth canonical to beta.wmcloud.org (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:21 krinkle@deploy1003: Started scap sync-world: Backport for beta: Change loginwiki/metawiki/auth canonical to beta.wmcloud.org (T289318)
  • 20:32 krinkle@deploy1003: Finished scap sync-world: Backport for beta: Include allowance for wmcloud.org in wgGraphAllowedDomains (T289318), beta: Change Beta wikidata canonical to beta.wmcloud.org (T289318) (duration: 94m 52s)
  • 20:26 krinkle@deploy1003: krinkle: Continuing with sync
  • 18:59 krinkle@deploy1003: krinkle: Backport for beta: Include allowance for wmcloud.org in wgGraphAllowedDomains (T289318), beta: Change Beta wikidata canonical to beta.wmcloud.org (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:57 krinkle@deploy1003: Started scap sync-world: Backport for beta: Include allowance for wmcloud.org in wgGraphAllowedDomains (T289318), beta: Change Beta wikidata canonical to beta.wmcloud.org (T289318)
  • 15:14 vgutierrez: fetch haproxy 2.8.15 on thirdparty/haproxy28 component for bullseye-wikimedia (apt.wm.o)
  • 14:46 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
  • 14:40 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1179.eqiad.wmnet with OS bullseye
  • 14:36 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 14:29 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 14:20 vgutierrez: repooling cp7006
  • 14:20 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 14:12 vgutierrez: depooling cp7006 for testing purposes
  • 14:09 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 14:06 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1179.eqiad.wmnet with OS bullseye
  • 14:01 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 13:15 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 13:08 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 12:59 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 12:51 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 12:31 vgutierrez: repool cp7006
  • 12:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7006.magru.wmnet
  • 12:31 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp7006.magru.wmnet
  • 12:11 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@38ba3ec]: bump section topics to v1.8.0 (duration: 00m 49s)
  • 12:11 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@38ba3ec]: bump section topics to v1.8.0
  • 11:08 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v10.0.2 with ibgp function in plugin - cmooney@cumin1003
  • 11:05 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v10.0.2 with ibgp function in plugin - cmooney@cumin1003
  • 10:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 32 hosts with reason: maintenance
  • 10:51 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 10:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2203,2212].codfw.wmnet with reason: Maintenance
  • 10:41 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 10:27 cgoubert@deploy1003: Unlocked for deployment [ALL REPOSITORIES]: Dragonfly supernodes reboot (duration: 09m 07s)
  • 10:26 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
  • 10:23 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
  • 10:22 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode2001.codfw.wmnet
  • 10:18 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
  • 10:18 cgoubert@deploy1003: Locking from deployment [ALL REPOSITORIES]: Dragonfly supernodes reboot
  • 10:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:01 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backupmon1001.eqiad.wmnet: Renew puppet certificate - jynus@cumin1002
  • 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install6002.wikimedia.org to drbd
  • 09:07 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backupmon1001.eqiad.wmnet with reason: Maintenance and reboot
  • 08:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install6002.wikimedia.org to drbd
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6001.drmrs.wmnet to drbd
  • 08:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2077.codfw.wmnet
  • 08:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6001.drmrs.wmnet to drbd
  • 08:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2077.codfw.wmnet
  • 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti6001.drmrs.wmnet to cluster drmrs01 and group B12
  • 08:32 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti6001.drmrs.wmnet to cluster drmrs01 and group B12
  • 08:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 08:04 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2020.codfw.wmnet
  • 08:04 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:04 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6001.drmrs.wmnet with OS bookworm
  • 08:03 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 07:58 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 07:56 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: testing
  • 07:53 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts ganeti2020.codfw.wmnet
  • 07:53 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2019.codfw.wmnet
  • 07:53 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:53 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 07:53 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti6001.drmrs.wmnet with reason: host reimage
  • 07:42 vgutierrez: depooling cp7006 for testing purposes
  • 07:42 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 07:39 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti6001.drmrs.wmnet with reason: host reimage
  • 07:36 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts ganeti2019.codfw.wmnet
  • 07:21 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti6001.drmrs.wmnet with OS bookworm
  • 07:19 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti6001.drmrs.wmnet with reason: reimage
  • 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver2003.codfw.wmnet
  • 07:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetserver2003.codfw.wmnet
  • 06:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc1002.eqiad.wmnet
  • 06:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc1002.eqiad.wmnet
  • 06:32 moritzm: failover Ganeti master in drmrs01 to ganeti6003 T382513
  • 06:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6001.drmrs.wmnet to plain
  • 06:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6001.drmrs.wmnet to plain
  • 06:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6001.wikimedia.org to plain
  • 06:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6001.wikimedia.org to plain
  • 06:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6001.drmrs.wmnet to plain
  • 06:26 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6001.drmrs.wmnet to plain
  • 06:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc1001.eqiad.wmnet
  • 06:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install6002.wikimedia.org to plain
  • 06:24 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install6002.wikimedia.org to plain
  • 06:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc1001.eqiad.wmnet
  • 06:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3007.wikimedia.org
  • 06:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3007.wikimedia.org
  • 04:32 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 04:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:52 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:46 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:44 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:44 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:22 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:21 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1042
  • 03:20 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1042
  • 03:19 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:19 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [cloudcephosd1042] - vriley@cumin1002"
  • 03:19 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [cloudcephosd1042] - vriley@cumin1002"
  • 03:15 vriley@cumin1002: START - Cookbook sre.dns.netbox

2025-07-03

  • 21:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast6003.wikimedia.org to plain
  • 21:19 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast6003.wikimedia.org to plain
  • 21:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
  • 21:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
  • 21:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast6003.wikimedia.org to drbd
  • 21:16 zabe@deploy1003: Finished scap sync-world: Backport for special: Do not throw ErrorPageError from getRedirect() (T398167), Set categorylinks to read new on small wikis (T397912) (duration: 08m 37s)
  • 21:11 zabe@deploy1003: kharlan, zabe: Continuing with sync
  • 21:09 zabe@deploy1003: kharlan, zabe: Backport for special: Do not throw ErrorPageError from getRedirect() (T398167), Set categorylinks to read new on small wikis (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:08 zabe@deploy1003: Started scap sync-world: Backport for special: Do not throw ErrorPageError from getRedirect() (T398167), Set categorylinks to read new on small wikis (T397912)
  • 20:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
  • 20:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast6003.wikimedia.org to drbd
  • 20:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
  • 20:47 arlolra@deploy1003: Finished scap sync-world: Backport for Use FallbackContentHandler for undeployed JsonConfig content handlers (T124748), ExtensionDistributor: Mark 1.44 as stable; remove 1.42 as EOL (T390798 T389313) (duration: 08m 27s)
  • 20:41 arlolra@deploy1003: arlolra, matmarex: Continuing with sync
  • 20:40 arlolra@deploy1003: arlolra, matmarex: Backport for Use FallbackContentHandler for undeployed JsonConfig content handlers (T124748), ExtensionDistributor: Mark 1.44 as stable; remove 1.42 as EOL (T390798 T389313) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:38 arlolra@deploy1003: Started scap sync-world: Backport for Use FallbackContentHandler for undeployed JsonConfig content handlers (T124748), ExtensionDistributor: Mark 1.44 as stable; remove 1.42 as EOL (T390798 T389313)
  • 20:36 cscott@deploy1003: Finished scap sync-world: Backport for skin: Omit "rendered with" phrase when the message is disabled (T398616) (duration: 08m 30s)
  • 20:30 cscott@deploy1003: cscott: Continuing with sync
  • 20:29 cscott@deploy1003: cscott: Backport for skin: Omit "rendered with" phrase when the message is disabled (T398616) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:27 cscott@deploy1003: Started scap sync-world: Backport for skin: Omit "rendered with" phrase when the message is disabled (T398616)
  • 20:12 zabe@deploy1003: Finished scap sync-world: Backport for Use correct index on categorylinks (T385890) (duration: 08m 32s)
  • 20:06 zabe@deploy1003: zabe: Continuing with sync
  • 20:05 zabe@deploy1003: zabe: Backport for Use correct index on categorylinks (T385890) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:03 zabe@deploy1003: Started scap sync-world: Backport for Use correct index on categorylinks (T385890)
  • 19:36 joal@deploy1003: Finished deploy [airflow-dags/analytics@7ba4a7b]: BUGFIX - Synchronize artifact for airflow_dags/analytics (duration: 01m 02s)
  • 19:35 joal@deploy1003: Started deploy [airflow-dags/analytics@7ba4a7b]: BUGFIX - Synchronize artifact for airflow_dags/analytics
  • 19:34 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@7ba4a7b]: BUGFIX - Synchronize artifact for airflow_dags/analytics_test (duration: 00m 16s)
  • 19:34 joal@deploy1003: Started deploy [airflow-dags/analytics_test@7ba4a7b]: BUGFIX - Synchronize artifact for airflow_dags/analytics_test
  • 17:33 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1176.eqiad.wmnet with OS bullseye
  • 17:26 joal@deploy1003: Finished deploy [airflow-dags/analytics@9088e59]: Synchronize artifacts for airflow_dags/analytics (duration: 00m 40s)
  • 17:25 joal@deploy1003: Started deploy [airflow-dags/analytics@9088e59]: Synchronize artifacts for airflow_dags/analytics
  • 17:24 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@9088e59]: Synchronize artifacat for airflow_dags/analytics_test (duration: 00m 15s)
  • 17:24 joal@deploy1003: Started deploy [airflow-dags/analytics_test@9088e59]: Synchronize artifacat for airflow_dags/analytics_test
  • 17:18 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1176.eqiad.wmnet with reason: host reimage
  • 17:15 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1176.eqiad.wmnet with reason: host reimage
  • 17:13 cmooney@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 17:13 cmooney@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 17:13 cmooney@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 17:12 cmooney@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 17:01 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 16:32 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 16:32 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 16:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 16:31 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 16:11 vgutierrez: repooling cp7006
  • 16:09 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7006.magru.wmnet
  • 16:09 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp7006.magru.wmnet
  • 15:52 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 15:52 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 15:46 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 15:46 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 15:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 15:38 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
  • 15:34 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: testing
  • 15:33 vgutierrez: depooling cp7006 for testing
  • 15:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T395241)', diff saved to https://phabricator.wikimedia.org/P78755 and previous config saved to /var/cache/conftool/dbconfig/20250703-153141-fceratto.json
  • 15:25 jmm@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:aux-worker-codfw
  • 15:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:22 vgutierrez: lvs5006 migrated to katran - T396561
  • 15:21 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs5006.eqsin.wmnet
  • 15:21 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs5006.eqsin.wmnet
  • 15:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P78754 and previous config saved to /var/cache/conftool/dbconfig/20250703-151633-fceratto.json
  • 15:10 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs5006.eqsin.wmnet with reason: katran migration
  • 15:04 jmm@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:aux-worker-codfw
  • 15:04 jmm@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:aux-worker-eqiad
  • 15:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P78753 and previous config saved to /var/cache/conftool/dbconfig/20250703-150126-fceratto.json
  • 14:56 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup1007.eqiad.wmnet: Renew puppet certificate - jynus@cumin1002
  • 14:55 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1005.eqiad.wmnet
  • 14:51 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host registry1005.eqiad.wmnet
  • 14:50 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup1006.eqiad.wmnet: Renew puppet certificate - jynus@cumin1002
  • 14:50 volans: uploaded debmonitor-server,python3-debmonitor_0.6.6 to apt.wikimedia.org bookworm-wikimedia
  • 14:49 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1004.eqiad.wmnet
  • 14:48 vgutierrez: repooling cp7006
  • 14:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T395241)', diff saved to https://phabricator.wikimedia.org/P78752 and previous config saved to /var/cache/conftool/dbconfig/20250703-144619-fceratto.json
  • 14:45 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host registry1004.eqiad.wmnet
  • 14:45 jmm@dns1004: END - running authdns-update
  • 14:44 jmm@dns1004: START - running authdns-update
  • 14:43 jmm@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:aux-worker-eqiad
  • 14:38 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T395241)', diff saved to https://phabricator.wikimedia.org/P78751 and previous config saved to /var/cache/conftool/dbconfig/20250703-143854-fceratto.json
  • 14:38 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 14:32 moritzm: installing bootstrap4 security updates
  • 14:23 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 14:17 vgutierrez: depooling cp7006 for testing
  • 14:09 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup1007.eqiad.wmnet with reason: Maintenance and reboot
  • 14:08 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup1006.eqiad.wmnet with reason: Maintenance and reboot
  • 14:05 moritzm: restarting clamav to pick up libxml security updates
  • 14:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
  • 13:59 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
  • 13:47 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1047.eqiad.wmnet
  • 13:46 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
  • 13:46 sukhe: sudo cumin 'A:wikidough' "disable-puppet 'merging CR 1163859'"
  • 13:45 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2005.codfw.wmnet
  • 13:40 moritzm: installing libxml2 security updates on bookworm
  • 13:40 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host registry2005.codfw.wmnet
  • 13:40 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
  • 13:39 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2005-dev.codfw.wmnet with OS bullseye
  • 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6001.drmrs.wmnet to drbd
  • 13:35 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
  • 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6001.drmrs.wmnet to drbd
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6001.wikimedia.org to drbd
  • 13:22 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 13:21 sukhe: sudo cumin -b11 'C:bird' "run-puppet-agent --enable 'merging CR 1163858'": NOOP change T374619
  • 13:20 TheresNoTime: done UTC afternoon backport window
  • 13:18 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 13:18 samtar@deploy1003: Finished scap sync-world: Backport for InitialiseSettings: Enable wgTemplateDataEnableDiscovery as default (T377978), Allow abusefilter block action on plwikiquote (T398137) (duration: 14m 04s)
  • 13:18 sukhe: sudo cumin 'C:bird' "disable-puppet 'merging CR 1163858'": T374619
  • 13:17 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6001.wikimedia.org to drbd
  • 13:11 samtar@deploy1003: samtar, eggroll97: Continuing with sync
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6001.drmrs.wmnet to drbd
  • 13:08 samtar@deploy1003: samtar, eggroll97: Backport for InitialiseSettings: Enable wgTemplateDataEnableDiscovery as default (T377978), Allow abusefilter block action on plwikiquote (T398137) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:04 samtar@deploy1003: Started scap sync-world: Backport for InitialiseSettings: Enable wgTemplateDataEnableDiscovery as default (T377978), Allow abusefilter block action on plwikiquote (T398137)
  • 12:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6001.drmrs.wmnet to drbd
  • 12:59 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2005-dev.codfw.wmnet with OS bullseye
  • 12:54 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@09893e3]: bump section topics to v1.7.0 (duration: 03m 20s)
  • 12:51 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@09893e3]: bump section topics to v1.7.0
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install6002.wikimedia.org to drbd
  • 11:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:45 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 11:45 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install6002.wikimedia.org to drbd
  • 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti6003.drmrs.wmnet to cluster drmrs01 and group B12
  • 11:37 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup1005.eqiad.wmnet: Renew puppet certificate - jynus@cumin1002
  • 11:35 jiji@deploy1003: Finished scap sync-world: T397907 - Upgrade Excimer to 1.2.5 in production (duration: 06m 59s)
  • 11:30 jiji@deploy1003: Started scap sync-world: T397907 - Upgrade Excimer to 1.2.5 in production
  • 11:29 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti6003.drmrs.wmnet to cluster drmrs01 and group B12
  • 11:27 jiji@deploy1003: Unlocked for deployment [ALL REPOSITORIES]: T397907 - Upgrade Excimer to 1.2.5 in production in progress, blocking deploys (duration: 44m 16s)
  • 11:26 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:21 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:21 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:17 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup1004.eqiad.wmnet: Renew puppet certificate - jynus@cumin1002
  • 11:16 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:15 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:15 effie: starting staged rollout of Excimer to 1.2.5, mw-api-ext
  • 11:15 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
  • 11:11 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:11 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for rgw.codfw.dpe.anycast.wmnet - cmooney@cumin1003"
  • 11:11 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for rgw.codfw.dpe.anycast.wmnet - cmooney@cumin1003"
  • 11:07 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:06 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:05 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:05 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 11:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 11:04 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:03 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:54 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:50 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 10:49 effie: starting staged rollout of Excimer to 1.2.5 mw-debug first, mw-api-int next
  • 10:47 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:44 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 10:43 jiji@deploy1003: Locking from deployment [ALL REPOSITORIES]: T397907 - Upgrade Excimer to 1.2.5 in production in progress, blocking deploys
  • 10:42 jiji@deploy1003: Stopping before sync operations
  • 10:26 jiji@deploy1003: Started scap sync-world: T397907 - Upgrade Excimer to 1.2.5 in production
  • 10:23 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup1005.eqiad.wmnet with reason: Maintenance and reboot
  • 10:12 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver2001.codfw.wmnet
  • 10:08 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:08 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetserver2001.codfw.wmnet
  • 10:05 volans@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on debmonitor2003.codfw.wmnet,debmonitor1003.eqiad.wmnet,debmonitor-dev2001.codfw.wmnet with reason: deploy new version
  • 10:00 volans: upgrading production debmonitor-server to the latest v0.6.5
  • 09:39 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2213 weights T398594', diff saved to https://phabricator.wikimedia.org/P78747 and previous config saved to /var/cache/conftool/dbconfig/20250703-093943-fceratto.json
  • 09:36 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2192 to s5 primary T398594', diff saved to https://phabricator.wikimedia.org/P78746 and previous config saved to /var/cache/conftool/dbconfig/20250703-093612-fceratto.json
  • 09:34 federico3: Starting s5 codfw failover from db2213 to db2192 - T398594
  • 09:31 vgutierrez: repooling cp7006
  • 09:30 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7006.magru.wmnet
  • 09:30 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp7006.magru.wmnet
  • 09:25 fceratto@cumin1002: dbctl commit (dc=all): 'Remove db2192 from API/vslow/dump T398594', diff saved to https://phabricator.wikimedia.org/P78745 and previous config saved to /var/cache/conftool/dbconfig/20250703-092522-fceratto.json
  • 09:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s5 T398594
  • 09:21 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup1004.eqiad.wmnet with reason: Maintenance and reboot
  • 09:21 fceratto@cumin1002: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1:00:00 on 22 hosts with reason: Primary switchover s5 T398593
  • 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6003.drmrs.wmnet with OS bookworm
  • 08:54 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1002.eqiad.wmnet
  • 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti6003.drmrs.wmnet with reason: host reimage
  • 08:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti6003.drmrs.wmnet with reason: host reimage
  • 08:48 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host krb1002.eqiad.wmnet
  • 08:42 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1048.eqiad.wmnet
  • 08:42 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet
  • 08:37 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet
  • 08:36 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1048.eqiad.wmnet
  • 08:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti6003.drmrs.wmnet with OS bookworm
  • 08:29 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti6003.drmrs.wmnet with reason: reimage
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast6003.wikimedia.org to plain
  • 08:26 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast6003.wikimedia.org to plain
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6001.drmrs.wmnet to plain
  • 08:23 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6001.drmrs.wmnet to plain
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6001.drmrs.wmnet to plain
  • 08:21 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6001.drmrs.wmnet to plain
  • 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6001.wikimedia.org to plain
  • 08:18 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6001.wikimedia.org to plain
  • 08:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:14 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.8 refs T392178
  • 08:13 volans: uploaded debmonitor-server,python3-debmonitor_0.6.5 to apt.wikimedia.org bookworm-wikimedia
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install6002.wikimedia.org to plain
  • 08:07 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install6002.wikimedia.org to plain
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus6002.drmrs.wmnet to drbd
  • 07:53 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 07:53 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 07:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repool pc4 T378715', diff saved to https://phabricator.wikimedia.org/P78744 and previous config saved to /var/cache/conftool/dbconfig/20250703-075225-ladsgroup.json
  • 07:52 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 07:51 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 07:50 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:49 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:42 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: haproxy testing
  • 07:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:39 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Feature: search in response reasons - oblivian@cumin1003"
  • 07:39 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: search in response reasons - oblivian@cumin1003
  • 07:38 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: search in response reasons - oblivian@cumin1003
  • 07:38 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Feature: search in response reasons - oblivian@cumin1003"
  • 07:34 effie: upload php-excimer_1.2.5-1+wmf11u1
  • 07:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for codeFolding: fix folding <ref> (T398430) (duration: 12m 16s)
  • 07:21 ladsgroup@deploy1003: musikanimal, ladsgroup: Continuing with sync
  • 07:18 vgutierrez: depooling cp7006 for requestctl debugging
  • 07:16 ladsgroup@deploy1003: musikanimal, ladsgroup: Backport for codeFolding: fix folding <ref> (T398430) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:14 ladsgroup@deploy1003: Started scap sync-world: Backport for codeFolding: fix folding <ref> (T398430)
  • 07:03 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus6002.drmrs.wmnet to drbd
  • 07:02 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on prometheus6002.drmrs.wmnet with reason: switch disk type back to DRBD
  • 07:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6002.wikimedia.org to drbd
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
  • 06:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 06:47 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6002.wikimedia.org to drbd
  • 06:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6002.drmrs.wmnet to drbd
  • 06:34 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6002.drmrs.wmnet to drbd
  • 03:38 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
  • 03:22 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 03:18 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 03:06 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 01:56 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 01:56 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 01:56 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 01:56 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 01:56 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 01:56 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 01:56 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 01:55 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 01:55 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 01:55 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 01:55 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 01:55 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 01:53 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 01:53 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 01:53 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 00:09 swfrench-wmf: reprepro include php-msgpack_3.0.0-1+wmf11u1 in component/php83 - T398245
  • 00:08 swfrench-wmf: reprepro include php-igbinary_3.2.16-4+wmf11u1 in component/php83 - T398245
  • 00:03 swfrench-wmf: reprepro include php-apcu_5.1.24-1+wmf11u1 in component/php83 - T398245

2025-07-02

  • 23:40 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 23:38 tzatziki: removing 15 files for legal compliance
  • 23:25 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
  • 23:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 23:07 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 23:05 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 23:05 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 23:02 ryankemper: [WDQS] `ryankemper@wdqs2009:~$ sudo systemctl restart prometheus-blazegraph-exporter-wdqs-blazegraph.service`
  • 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:51 dancy@deploy1003: Installation of scap version "4.186.0" completed for 2 hosts
  • 22:49 dancy@deploy1003: Installing scap version "4.186.0" for 2 host(s)
  • 22:49 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 22:40 ryankemper: [WDQS] Restart wdqs-blazegraph on wdqs2009
  • 22:27 zabe@deploy1003: Finished scap sync-world: Backport for ApiQueryCategoryMembers: Use correct index for categorylinks (T385890 T398448) (duration: 09m 12s)
  • 22:21 zabe@deploy1003: zabe: Continuing with sync
  • 22:21 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:20 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 22:19 zabe@deploy1003: zabe: Backport for ApiQueryCategoryMembers: Use correct index for categorylinks (T385890 T398448) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:19 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 22:19 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:17 zabe@deploy1003: Started scap sync-world: Backport for ApiQueryCategoryMembers: Use correct index for categorylinks (T385890 T398448)
  • 22:12 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:07 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:02 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:59 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:55 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:49 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:23 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:23 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:22 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:22 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:20 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:20 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:16 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:15 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:14 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:13 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:12 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:11 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:05 krinkle@deploy1003: Finished scap sync-world: Backport for missing.php: Support beta suffix for auth.wikimedia error page (T289318) (duration: 29m 54s)
  • 20:59 krinkle@deploy1003: krinkle: Continuing with sync
  • 20:37 krinkle@deploy1003: krinkle: Backport for missing.php: Support beta suffix for auth.wikimedia error page (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:35 krinkle@deploy1003: Started scap sync-world: Backport for missing.php: Support beta suffix for auth.wikimedia error page (T289318)
  • 20:34 swfrench-wmf: reprepro include dh-php_5.5+wmf11u1 in component/php83 - T398245
  • 20:31 krinkle@deploy1003: Finished scap sync-world: Beta patches Iff58893f, I62b31535, I228d7766a57 (duration: 03m 06s)
  • 20:30 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 20:29 swfrench-wmf: reprepro include php-defaults_94+wmf11u1 in component/php83 - T398245
  • 20:28 krinkle@deploy1003: Started scap sync-world: Beta patches Iff58893f, I62b31535, I228d7766a57
  • 20:10 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 20:06 Krinkle: krinkle@deploy1003:/srv/mediawiki$ git remote rm gerrit -- Fix `jforrester@gerrit.wikimedia.org: Permission denied (publickey).` There were two remotes: $ git remote -v gerrit ssh://jforrester@gerrit origin ssh://gerrit.wikimedia.org:29418
  • 20:06 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 19:47 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 18:42 swfrench-wmf: reprepro include php8.3_8.3.22-1+wmf11u1 in component/php83 - T398245
  • 17:53 swfrench-wmf: reprepro update component/php83 with pcre2 10.42-1~wmf11+1 - T398245
  • 17:47 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2330.codfw.wmnet
  • 17:41 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2330.codfw.wmnet
  • 17:41 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2329.codfw.wmnet
  • 17:36 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2329.codfw.wmnet
  • 17:36 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2328.codfw.wmnet
  • 17:34 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2328.codfw.wmnet
  • 17:34 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2327.codfw.wmnet
  • 17:31 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2327.codfw.wmnet
  • 17:31 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2326.codfw.wmnet
  • 17:29 dzahn@dns1004: END - running authdns-update
  • 17:28 dzahn@dns1004: START - running authdns-update
  • 17:26 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2326.codfw.wmnet
  • 17:26 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2325.codfw.wmnet
  • 17:21 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2325.codfw.wmnet
  • 17:21 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2324.codfw.wmnet
  • 17:15 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2324.codfw.wmnet
  • 17:15 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2323.codfw.wmnet
  • 17:10 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2323.codfw.wmnet
  • 17:10 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2322.codfw.wmnet
  • 17:04 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2322.codfw.wmnet
  • 17:04 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2321.codfw.wmnet
  • 16:58 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2321.codfw.wmnet
  • 16:58 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2320.codfw.wmnet
  • 16:53 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2320.codfw.wmnet
  • 16:53 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2319.codfw.wmnet
  • 16:48 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2319.codfw.wmnet
  • 16:48 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2318.codfw.wmnet
  • 16:47 inflatador: bking@cumin1002 restarting cirrrussearch codfw T397227
  • 16:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 16:43 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 16:43 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2318.codfw.wmnet
  • 16:43 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2317.codfw.wmnet
  • 16:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 16:39 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 16:39 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2317.codfw.wmnet
  • 16:39 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2316.codfw.wmnet
  • 16:33 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2316.codfw.wmnet
  • 16:33 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2315.codfw.wmnet
  • 16:28 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2315.codfw.wmnet
  • 16:28 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2314.codfw.wmnet
  • 16:22 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2314.codfw.wmnet
  • 16:22 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2313.codfw.wmnet
  • 16:17 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2313.codfw.wmnet
  • 16:17 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2312.codfw.wmnet
  • 16:13 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 16:12 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2312.codfw.wmnet
  • 16:12 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2311.codfw.wmnet
  • 16:10 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
  • 16:10 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
  • 16:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 16:06 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2311.codfw.wmnet
  • 16:06 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2310.codfw.wmnet
  • 16:01 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2310.codfw.wmnet
  • 16:01 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2309.codfw.wmnet
  • 15:56 vgutierrez: switch lvs4010 to katran - 10.128.0.11
  • 15:56 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2309.codfw.wmnet
  • 15:56 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2308.codfw.wmnet
  • 15:55 jnuche@deploy1003: Finished scap sync-world: Backport for Rename EventRegistration::$meetingAddress to $address for cache compat (T398413) (duration: 08m 51s)
  • 15:53 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2308.codfw.wmnet
  • 15:53 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2307.codfw.wmnet
  • 15:49 jnuche@deploy1003: jnuche, daimona: Continuing with sync
  • 15:49 jnuche@deploy1003: jnuche, daimona: Backport for Rename EventRegistration::$meetingAddress to $address for cache compat (T398413) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:48 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2307.codfw.wmnet
  • 15:48 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2306.codfw.wmnet
  • 15:47 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs4010.ulsfo.wmnet with reason: katran migration
  • 15:46 jnuche@deploy1003: Started scap sync-world: Backport for Rename EventRegistration::$meetingAddress to $address for cache compat (T398413)
  • 15:42 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2306.codfw.wmnet
  • 15:42 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2305.codfw.wmnet
  • 15:38 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2305.codfw.wmnet
  • 15:38 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2304.codfw.wmnet
  • 15:33 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2304.codfw.wmnet
  • 15:33 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2303.codfw.wmnet
  • 15:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6002.drmrs.wmnet to drbd
  • 15:28 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2303.codfw.wmnet
  • 15:28 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2302.codfw.wmnet
  • 15:22 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2302.codfw.wmnet
  • 15:22 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2301.codfw.wmnet
  • 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6002.drmrs.wmnet to drbd
  • 15:17 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2301.codfw.wmnet
  • 15:17 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2300.codfw.wmnet
  • 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow6001.drmrs.wmnet to drbd
  • 15:15 vgutierrez: repool cp7006
  • 15:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc2014
  • 15:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host pc2014
  • 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:11 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2300.codfw.wmnet
  • 15:11 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2299.codfw.wmnet
  • 15:11 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:08 dancy@deploy1003: Installation of scap version "4.185.0" completed for 2 hosts
  • 15:06 jiji@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-ro,name=eqiad
  • 15:06 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2299.codfw.wmnet
  • 15:06 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2298.codfw.wmnet
  • 15:06 dancy@deploy1003: Installing scap version "4.185.0" for 2 host(s)
  • 15:05 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow6001.drmrs.wmnet to drbd
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti6002.drmrs.wmnet to cluster drmrs02 and group B13
  • 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti6002.drmrs.wmnet to cluster drmrs02 and group B13
  • 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 15:01 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2298.codfw.wmnet
  • 15:01 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2297.codfw.wmnet
  • 15:00 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:57 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc2014
  • 14:56 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc2014
  • 14:55 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2297.codfw.wmnet
  • 14:55 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2296.codfw.wmnet
  • 14:55 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 14:52 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6002.drmrs.wmnet with OS bookworm
  • 14:50 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2296.codfw.wmnet
  • 14:50 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2295.codfw.wmnet
  • 14:47 jiji@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=mw-api-ext-ro,name=eqiad
  • 14:45 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2295.codfw.wmnet
  • 14:44 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2294.codfw.wmnet
  • 14:42 godog: bounce thanos-store on titan1002
  • 14:40 oblivian@deploy1003: Finished scap sync-world: Backport for Revert "group1: Set categorylinks to read new" (duration: 08m 26s)
  • 14:39 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2294.codfw.wmnet
  • 14:39 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2293.codfw.wmnet
  • 14:39 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@1bb179b]: bump section topics to v1.6.0 (duration: 00m 47s)
  • 14:38 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@1bb179b]: bump section topics to v1.6.0
  • 14:38 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 14:38 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 14:36 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:35 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:35 oblivian@deploy1003: zabe, oblivian: Continuing with sync
  • 14:34 oblivian@deploy1003: zabe, oblivian: Backport for Revert "group1: Set categorylinks to read new" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:34 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2293.codfw.wmnet
  • 14:34 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2292.codfw.wmnet
  • 14:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti6002.drmrs.wmnet with reason: host reimage
  • 14:31 oblivian@deploy1003: Started scap sync-world: Backport for Revert "group1: Set categorylinks to read new"
  • 14:31 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1048.eqiad.wmnet
  • 14:31 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1048.eqiad.wmnet
  • 14:30 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:28 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2292.codfw.wmnet
  • 14:28 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2291.codfw.wmnet
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti6002.drmrs.wmnet with reason: host reimage
  • 14:23 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2291.codfw.wmnet
  • 14:23 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2290.codfw.wmnet
  • 14:18 zabe@deploy1003: Finished scap sync-world: retry revert (duration: 04m 27s)
  • 14:18 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2290.codfw.wmnet
  • 14:17 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2289.codfw.wmnet
  • 14:14 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: activate new plugins packages - bking@cumin1002 - T397227
  • 14:14 zabe@deploy1003: Started scap sync-world: retry revert
  • 14:12 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2289.codfw.wmnet
  • 14:12 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2288.codfw.wmnet
  • 14:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti6002.drmrs.wmnet with OS bookworm
  • 14:08 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2288.codfw.wmnet
  • 14:07 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2287.codfw.wmnet
  • 14:06 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti6002.drmrs.wmnet with reason: reimage
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus6002.drmrs.wmnet to plain
  • 14:03 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2287.codfw.wmnet
  • 14:02 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2286.codfw.wmnet
  • 14:01 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus6002.drmrs.wmnet to plain
  • 13:53 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: activate new plugins packages - bking@cumin1002 - T397227
  • 13:53 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: activate new plugins packages - bking@cumin1002 - T397227
  • 13:52 zabe@deploy1003: sync-world aborted: T397912 (duration: 04m 03s)
  • 13:48 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2283.codfw.wmnet
  • 13:41 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2282.codfw.wmnet
  • 13:40 zabe@deploy1003: Started scap sync-world: T397912
  • 13:39 _joe_: repooling cp7006, testing logging improvements
  • 13:37 vgutierrez: switch upload@eqsin to the new upload cert - T394484
  • 13:35 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2282.codfw.wmnet
  • 13:35 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2281.codfw.wmnet
  • 13:30 zabe@deploy1003: zabe: Continuing with sync
  • 13:30 moritzm: failover Ganeti master in drmrs02 to ganeti6004 T382513
  • 13:30 zabe@deploy1003: zabe: Backport for group1: Set categorylinks to read new (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:29 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2281.codfw.wmnet
  • 13:29 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2280.codfw.wmnet
  • 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
  • 13:27 zabe@deploy1003: Started scap sync-world: Backport for group1: Set categorylinks to read new (T397912)
  • 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6002.wikimedia.org to drbd
  • 13:24 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2280.codfw.wmnet
  • 13:24 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2279.codfw.wmnet
  • 13:21 _joe_: depooling cp7006 for testing
  • 13:18 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2279.codfw.wmnet
  • 13:18 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2278.codfw.wmnet
  • 13:18 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6002.wikimedia.org to drbd
  • 13:18 moritzm: installing rsyslog bugfix updates from Bookworm point release
  • 13:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6002.drmrs.wmnet to drbd
  • 13:17 samtar@deploy1003: Finished scap sync-world: Backport for Assign oathauth-verify-user to default bureaucrat (T265726), Add abusefilter-revert to sysops on testwiki (T398107) (duration: 11m 16s)
  • 13:13 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2278.codfw.wmnet
  • 13:13 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2277.codfw.wmnet
  • 13:13 jgreen@dns1004: END - running authdns-update
  • 13:11 jgreen@dns1004: START - running authdns-update
  • 13:11 samtar@deploy1003: samtar, eggroll97: Continuing with sync
  • 13:08 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6002.drmrs.wmnet to drbd
  • 13:08 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2277.codfw.wmnet
  • 13:08 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2276.codfw.wmnet
  • 13:08 samtar@deploy1003: samtar, eggroll97: Backport for Assign oathauth-verify-user to default bureaucrat (T265726), Add abusefilter-revert to sysops on testwiki (T398107) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6002.drmrs.wmnet to drbd
  • 13:05 samtar@deploy1003: Started scap sync-world: Backport for Assign oathauth-verify-user to default bureaucrat (T265726), Add abusefilter-revert to sysops on testwiki (T398107)
  • 13:02 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2276.codfw.wmnet
  • 13:02 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2275.codfw.wmnet
  • 12:58 urbanecm@deploy1003: Finished scap sync-world: Backport for [Growth] Move Impact limit configuration to ext-GrowthExperiments (T341599), [Growth] enwiki: Decrease wgGEUserImpactMaxEdits to 1000 (T398418 T341599) (duration: 09m 42s)
  • 12:57 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2275.codfw.wmnet
  • 12:57 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2274.codfw.wmnet
  • 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6002.drmrs.wmnet to drbd
  • 12:52 urbanecm@deploy1003: urbanecm: Continuing with sync
  • 12:52 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2274.codfw.wmnet
  • 12:52 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2273.codfw.wmnet
  • 12:51 urbanecm@deploy1003: urbanecm: Backport for [Growth] Move Impact limit configuration to ext-GrowthExperiments (T341599), [Growth] enwiki: Decrease wgGEUserImpactMaxEdits to 1000 (T398418 T341599) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:49 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] Move Impact limit configuration to ext-GrowthExperiments (T341599), [Growth] enwiki: Decrease wgGEUserImpactMaxEdits to 1000 (T398418 T341599)
  • 12:47 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2273.codfw.wmnet
  • 12:47 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2272.codfw.wmnet
  • 12:41 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2272.codfw.wmnet
  • 12:41 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2271.codfw.wmnet
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus6002.drmrs.wmnet to drbd
  • 12:36 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2271.codfw.wmnet
  • 12:36 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2270.codfw.wmnet
  • 12:30 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2270.codfw.wmnet
  • 12:30 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2269.codfw.wmnet
  • 12:25 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2269.codfw.wmnet
  • 12:25 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2268.codfw.wmnet
  • 12:20 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2268.codfw.wmnet
  • 12:20 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2267.codfw.wmnet
  • 12:14 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2267.codfw.wmnet
  • 12:14 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2266.codfw.wmnet
  • 12:14 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 12:10 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 12:10 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 12:09 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2266.codfw.wmnet
  • 12:09 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2265.codfw.wmnet
  • 12:08 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:08 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:07 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 12:06 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:04 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2265.codfw.wmnet
  • 12:04 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2264.codfw.wmnet
  • 11:58 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2264.codfw.wmnet
  • 11:58 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2263.codfw.wmnet
  • 11:53 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2263.codfw.wmnet
  • 11:52 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2262.codfw.wmnet
  • 11:47 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2262.codfw.wmnet
  • 11:47 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2261.codfw.wmnet
  • 11:47 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 11:42 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 11:42 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2261.codfw.wmnet
  • 11:42 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2260.codfw.wmnet
  • 11:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet
  • 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus6002.drmrs.wmnet to drbd
  • 11:37 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2260.codfw.wmnet
  • 11:37 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2259.codfw.wmnet
  • 11:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 37271
  • 11:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 37271
  • 11:33 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
  • 11:31 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2259.codfw.wmnet
  • 11:31 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2258.codfw.wmnet
  • 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow6001.drmrs.wmnet to drbd
  • 11:26 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2258.codfw.wmnet
  • 11:26 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2257.codfw.wmnet
  • 11:21 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2257.codfw.wmnet
  • 11:20 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2256.codfw.wmnet
  • 11:19 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow6001.drmrs.wmnet to drbd
  • 11:16 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of netflow6001.drmrs.wmnet to drbd
  • 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow6001.drmrs.wmnet to drbd
  • 11:15 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2256.codfw.wmnet
  • 11:15 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2255.codfw.wmnet
  • 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti6004.drmrs.wmnet to cluster drmrs02 and group B13
  • 11:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti6004.drmrs.wmnet to cluster drmrs02 and group B13
  • 11:10 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2255.codfw.wmnet
  • 11:09 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2254.codfw.wmnet
  • 11:04 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2254.codfw.wmnet
  • 11:04 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2253.codfw.wmnet
  • 11:00 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2253.codfw.wmnet
  • 11:00 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2252.codfw.wmnet
  • 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 10:55 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2252.codfw.wmnet
  • 10:55 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2251.codfw.wmnet
  • 10:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 10:50 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2251.codfw.wmnet
  • 10:50 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 10:49 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2250.codfw.wmnet
  • 10:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 10:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37271
  • 10:48 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 37271
  • 10:47 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:47 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:47 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 137236
  • 10:47 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:46 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 137236
  • 10:44 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2250.codfw.wmnet
  • 10:44 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2249.codfw.wmnet
  • 10:44 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:43 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:43 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 10:42 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6004.drmrs.wmnet with OS bookworm
  • 10:39 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2249.codfw.wmnet
  • 10:39 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2248.codfw.wmnet
  • 10:35 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:35 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:33 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2248.codfw.wmnet
  • 10:33 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:32 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 10:28 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:28 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 10:28 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 10:27 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 10:27 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 10:26 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 10:21 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1092.eqiad.wmnet with OS bullseye
  • 10:21 mvernon@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1003"
  • 10:21 mvernon@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1003"
  • 10:18 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1093.eqiad.wmnet with OS bullseye
  • 10:18 mvernon@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1003"
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti6004.drmrs.wmnet with reason: host reimage
  • 10:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti6004.drmrs.wmnet with reason: host reimage
  • 10:13 mvernon@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1003"
  • 10:08 kharlan@deploy1003: Finished scap sync-world: Backport for UserInfoCard: prevent default link behavior with "click" (T398323) (duration: 09m 52s)
  • 10:04 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts backup1001.eqiad.wmnet
  • 10:04 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:04 jynus@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: backup1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 10:04 jynus@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: backup1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 10:03 kharlan@deploy1003: kharlan: Continuing with sync
  • 10:02 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1092.eqiad.wmnet with reason: host reimage
  • 10:01 kharlan@deploy1003: kharlan: Backport for UserInfoCard: prevent default link behavior with "click" (T398323) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:00 jynus@cumin1002: START - Cookbook sre.dns.netbox
  • 09:58 kharlan@deploy1003: Started scap sync-world: Backport for UserInfoCard: prevent default link behavior with "click" (T398323)
  • 09:57 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1092.eqiad.wmnet with reason: host reimage
  • 09:55 jynus@cumin1002: START - Cookbook sre.hosts.decommission for hosts backup1001.eqiad.wmnet
  • 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti6004.drmrs.wmnet with OS bookworm
  • 09:54 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1093.eqiad.wmnet with reason: host reimage
  • 09:53 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti6004.drmrs.wmnet with reason: reimage
  • 09:50 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1093.eqiad.wmnet with reason: host reimage
  • 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6002.wikimedia.org to plain
  • 09:49 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6002.wikimedia.org to plain
  • 09:49 vgutierrez: acme-chief: stop issuing RSA certificates by default - T398020
  • 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6002.drmrs.wmnet to plain
  • 09:47 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts backup2001.codfw.wmnet
  • 09:47 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:47 jynus@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: backup2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 09:47 jynus@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: backup2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 09:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6002.drmrs.wmnet to plain
  • 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus6002.drmrs.wmnet to plain
  • 09:45 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus6002.drmrs.wmnet to plain
  • 09:44 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Bugfixes: api auth and bwlimit rules - oblivian@cumin1003"
  • 09:44 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes: api auth and bwlimit rules - oblivian@cumin1003
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6002.drmrs.wmnet to plain
  • 09:43 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes: api auth and bwlimit rules - oblivian@cumin1003
  • 09:43 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Bugfixes: api auth and bwlimit rules - oblivian@cumin1003"
  • 09:42 jynus@cumin1002: START - Cookbook sre.dns.netbox
  • 09:42 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6002.drmrs.wmnet to plain
  • 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow6001.drmrs.wmnet to plain
  • 09:39 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow6001.drmrs.wmnet to plain
  • 09:37 jynus@cumin1002: START - Cookbook sre.hosts.decommission for hosts backup2001.codfw.wmnet
  • 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
  • 09:36 zabe@deploy1003: Finished scap sync-world: Backport for Reapply "categorylinks: Set group0 to read new" (T397912) (duration: 10m 15s)
  • 09:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
  • 09:30 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1092.eqiad.wmnet with OS bullseye
  • 09:29 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1093.eqiad.wmnet with OS bullseye
  • 09:28 zabe@deploy1003: zabe: Continuing with sync
  • 09:27 zabe@deploy1003: zabe: Backport for Reapply "categorylinks: Set group0 to read new" (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:25 zabe@deploy1003: Started scap sync-world: Backport for Reapply "categorylinks: Set group0 to read new" (T397912)
  • 09:23 zabe@deploy1003: Finished scap sync-world: Backport for Fix categorylinks join order and use index on correct table (T398380) (duration: 08m 26s)
  • 09:18 zabe@deploy1003: zabe: Continuing with sync
  • 09:17 zabe@deploy1003: zabe: Backport for Fix categorylinks join order and use index on correct table (T398380) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:15 zabe@deploy1003: Started scap sync-world: Backport for Fix categorylinks join order and use index on correct table (T398380)
  • 09:06 volans: uploaded debmonitor-server,python3-debmonitor_0.6.4 to apt.wikimedia.org bookworm-wikimedia
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
  • 09:06 jmm@dns1004: END - running authdns-update
  • 09:05 jmm@dns1004: START - running authdns-update
  • 09:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
  • 09:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti3006.esams.wmnet to cluster esams02 and group BW27
  • 09:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3006.esams.wmnet to cluster esams02 and group BW27
  • 09:01 moritzm: rebalance ganeti/eqsin following Bookworm reimages
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5007.eqsin.wmnet to cluster eqsin and group 1
  • 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5007.eqsin.wmnet to cluster eqsin and group 1
  • 08:53 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 08:43 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 08:34 jmm@dns1004: END - running authdns-update
  • 08:33 jmm@dns1004: START - running authdns-update
  • 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5007.eqsin.wmnet with OS bookworm
  • 08:28 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver2004.codfw.wmnet
  • 08:20 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetserver2004.codfw.wmnet
  • 08:16 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.8 refs T392178
  • 08:10 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver1003.eqiad.wmnet
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5007.eqsin.wmnet with reason: host reimage
  • 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5007.eqsin.wmnet with reason: host reimage
  • 08:02 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetserver1003.eqiad.wmnet
  • 07:50 jmm@dns1004: END - running authdns-update
  • 07:49 jmm@dns1004: START - running authdns-update
  • 07:40 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5007.eqsin.wmnet with OS bookworm
  • 07:38 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5007.eqsin.wmnet with reason: reimage
  • 06:29 Amir1: dropping l10n_cache table everywhere (T397367)
  • 06:28 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on pc2014.codfw.wmnet,pc1014.eqiad.wmnet with reason: Switch to 10G (T378715)
  • 06:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool pc4 T378715', diff saved to https://phabricator.wikimedia.org/P78735 and previous config saved to /var/cache/conftool/dbconfig/20250702-061517-ladsgroup.json
  • 06:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300
  • 06:02 slyngshede@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300
  • 05:58 slyngshede@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300
  • 05:57 slyngshede@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300
  • 02:50 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bullseye
  • 02:32 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 02:28 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 02:12 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bullseye
  • 00:53 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm

2025-07-01

  • 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1093.eqiad.wmnet with reason: host reimage
  • 23:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1093.eqiad.wmnet with reason: host reimage
  • 23:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1092.eqiad.wmnet with reason: host reimage
  • 23:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1092.eqiad.wmnet with reason: host reimage
  • 23:19 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1054.eqiad.wmnet with OS bookworm
  • 23:16 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 23:08 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1093.eqiad.wmnet with OS bullseye
  • 23:03 zabe@deploy1003: Finished scap sync-world: Backport for Revert "categorylinks: Set group0 to read new" (T397912 T398380) (duration: 08m 49s)
  • 22:58 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
  • 22:57 zabe@deploy1003: zabe: Continuing with sync
  • 22:57 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 22:57 zabe@deploy1003: zabe: Backport for Revert "categorylinks: Set group0 to read new" (T397912 T398380) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1092.eqiad.wmnet with OS bullseye
  • 22:54 zabe@deploy1003: Started scap sync-world: Backport for Revert "categorylinks: Set group0 to read new" (T397912 T398380)
  • 22:54 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:54 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sync - dzahn@cumin1002"
  • 22:54 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sync - dzahn@cumin1002"
  • 22:54 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:53 zabe@deploy1003: Finished scap sync-world: Backport for categorylinks: Set group0 to read new (T397912) (duration: 08m 40s)
  • 22:49 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:48 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 22:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1093.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:47 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 22:47 zabe@deploy1003: zabe: Continuing with sync
  • 22:46 zabe@deploy1003: zabe: Backport for categorylinks: Set group0 to read new (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:45 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts miscweb1003.eqiad.wmnet
  • 22:45 dzahn@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 22:44 zabe@deploy1003: Started scap sync-world: Backport for categorylinks: Set group0 to read new (T397912)
  • 22:44 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:44 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 22:36 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 22:35 toyofuku@deploy1003: Finished scap sync-world: Backport for Update mobile search overlay temporary input styles (duration: 29m 56s)
  • 22:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1092.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:34 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 22:31 dzahn@cumin1002: START - Cookbook sre.hosts.decommission for hosts miscweb1003.eqiad.wmnet
  • 22:30 toyofuku@deploy1003: bwang, toyofuku: Continuing with sync
  • 22:28 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:28 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts miscweb2003.codfw.wmnet
  • 22:28 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:28 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: miscweb2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1002"
  • 22:28 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: miscweb2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1002"
  • 22:26 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:23 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 22:22 ejegg: payments-wiki upgraded from a92f03c3 to 9c7f3a73
  • 22:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-be1092.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-be1093.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:18 dzahn@cumin1002: START - Cookbook sre.hosts.decommission for hosts miscweb2003.codfw.wmnet
  • 22:17 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns ms-be1092,934 - jclark@cumin1002"
  • 22:17 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns ms-be1092,934 - jclark@cumin1002"
  • 22:14 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 22:10 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:10 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:09 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:07 toyofuku@deploy1003: bwang, toyofuku: Backport for Update mobile search overlay temporary input styles synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:06 ejegg: fundraising scheduled jobs restarted
  • 22:05 toyofuku@deploy1003: Started scap sync-world: Backport for Update mobile search overlay temporary input styles
  • 22:04 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:02 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on miscweb2003.codfw.wmnet with reason: decom
  • 22:01 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: decom
  • 21:59 toyofuku@deploy1003: Finished scap sync-world: Backport for Enable mobile search recommendations in all eligible wikis except enwiki (duration: 10m 10s)
  • 21:59 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1054.eqiad.wmnet with OS bookworm
  • 21:56 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 21:55 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 21:55 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 21:54 toyofuku@deploy1003: toyofuku, bwang: Continuing with sync
  • 21:51 toyofuku@deploy1003: toyofuku, bwang: Backport for Enable mobile search recommendations in all eligible wikis except enwiki synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:51 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:51 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:50 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:49 toyofuku@deploy1003: Started scap sync-world: Backport for Enable mobile search recommendations in all eligible wikis except enwiki
  • 21:48 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:45 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:25 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 21:06 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 21:03 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 20:48 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:46 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1005.eqiad.wmnet with OS bullseye
  • 20:45 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:45 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 20:41 cjming@deploy1003: Finished scap sync-world: Backport for zhwiki: Permissions change for abusefilter groups (T397788) (duration: 10m 35s)
  • 20:39 ejegg: fundraising civicrm upgraded from 5ae93148 to 521d0dbe
  • 20:36 cjming@deploy1003: zhaofjx, cjming: Continuing with sync
  • 20:33 cjming@deploy1003: zhaofjx, cjming: Backport for zhwiki: Permissions change for abusefilter groups (T397788) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:31 cjming@deploy1003: Started scap sync-world: Backport for zhwiki: Permissions change for abusefilter groups (T397788)
  • 20:26 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:26 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:24 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1005.eqiad.wmnet with reason: host reimage
  • 20:20 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1005.eqiad.wmnet with reason: host reimage
  • 20:20 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:04 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore1005.eqiad.wmnet with OS bullseye
  • 20:04 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1005.eqiad.wmnet with OS bullseye
  • 20:03 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2001.codfw.wmnet with reason: T383173
  • 20:02 ejegg: disabled queue consumers for segment updates
  • 19:50 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:50 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:47 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore1005.eqiad.wmnet with OS bullseye
  • 19:43 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 19:42 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:37 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:26 kemayo@deploy1003: Finished scap sync-world: Backport for Edit check: fix counter logging for SLO (T395444) (duration: 09m 07s)
  • 19:23 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 19:20 kemayo@deploy1003: kemayo: Continuing with sync
  • 19:20 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 19:19 kemayo@deploy1003: kemayo: Backport for Edit check: fix counter logging for SLO (T395444) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:17 kemayo@deploy1003: Started scap sync-world: Backport for Edit check: fix counter logging for SLO (T395444)
  • 19:00 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 19:00 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2001.codfw.wmnet with reason: T383173
  • 17:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
  • 16:56 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd2003-dev.codfw.wmnet
  • 16:56 andrew@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:56 andrew@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd2003-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1003"
  • 16:55 andrew@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd2003-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1003"
  • 16:51 andrew@cumin1003: START - Cookbook sre.dns.netbox
  • 16:45 andrew@cumin1003: START - Cookbook sre.hosts.decommission for hosts cloudcephosd2003-dev.codfw.wmnet
  • 16:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repool pc3 T378715', diff saved to https://phabricator.wikimedia.org/P78734 and previous config saved to /var/cache/conftool/dbconfig/20250701-164405-ladsgroup.json
  • 16:37 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 16:37 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 16:13 swfrench@deploy1003: Finished scap sync-world: Backport for Remove title-case overrides for PHP 8.1 migration (T394556) (duration: 09m 21s)
  • 16:11 inflatador: bking@prometheus1005:~$ sudo run-puppet-agent T398341
  • 16:10 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 16:07 swfrench@deploy1003: swfrench: Continuing with sync
  • 16:06 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc2013
  • 16:06 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc2013
  • 16:06 swfrench@deploy1003: swfrench: Backport for Remove title-case overrides for PHP 8.1 migration (T394556) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:04 swfrench@deploy1003: Started scap sync-world: Backport for Remove title-case overrides for PHP 8.1 migration (T394556)
  • 16:01 swfrench-wmf: finished page renames for Unicode title-case transition - T396903
  • 15:54 swfrench-wmf: starting page renames for Unicode title-case transition - T396903
  • 15:51 swfrench-wmf: renamed 1 user for Unicode title-case transition - T396903
  • 15:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7003.magru.wmnet
  • 15:44 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7003.magru.wmnet
  • 15:37 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
  • 15:37 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
  • 15:35 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs7003.magru.wmnet with reason: katran migration
  • 15:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
  • 15:25 ejegg: SmashPig upgraded from 8486f9fb to 52397453
  • 15:21 ejegg: SmashPig upgraded from bdc59e01 to 8486f9fb
  • 15:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
  • 15:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
  • 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1001.eqiad.wmnet
  • 15:10 brennen@deploy1003: Finished deploy [phabricator/deployment@311587a]: deploy phab1004 for T398328 (duration: 00m 37s)
  • 15:09 brennen@deploy1003: Started deploy [phabricator/deployment@311587a]: deploy phab1004 for T398328
  • 15:09 brennen@deploy1003: Finished deploy [phabricator/deployment@311587a]: deploy phab2002 for T398328 (duration: 00m 41s)
  • 15:08 brennen@deploy1003: Started deploy [phabricator/deployment@311587a]: deploy phab2002 for T398328
  • 15:08 ejegg: standalone SmashPig upgraded from ad4baa32 to bdc59e01
  • 15:08 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
  • 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-wf1001.eqiad.wmnet
  • 15:04 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:02 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:02 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:01 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:00 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 14:57 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 14:55 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:54 moritzm: failover Ganeti master in eqsin to ganeti5004
  • 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5006.eqsin.wmnet to cluster eqsin and group 1
  • 14:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
  • 14:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5006.eqsin.wmnet to cluster eqsin and group 1
  • 14:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
  • 14:26 cgoubert@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 14:25 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
  • 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5006.eqsin.wmnet with OS bookworm
  • 13:51 cgoubert@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 13:51 zabe@deploy1003: Finished scap sync-world: Backport for categorylinks: Set testwiki to read new (T397912) (duration: 09m 44s)
  • 13:45 zabe@deploy1003: zabe: Continuing with sync
  • 13:44 zabe@deploy1003: zabe: Backport for categorylinks: Set testwiki to read new (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:43 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:41 zabe@deploy1003: Started scap sync-world: Backport for categorylinks: Set testwiki to read new (T397912)
  • 13:40 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:39 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:37 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:36 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:35 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:29 urbanecm@deploy1003: Finished scap sync-world: Backport for Growth: Configure higher impact module edit limits for english and test wiki (T341599) (duration: 19m 10s)
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5006.eqsin.wmnet with reason: host reimage
  • 13:23 urbanecm@deploy1003: urbanecm, cyndywikime: Continuing with sync
  • 13:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5006.eqsin.wmnet with reason: host reimage
  • 13:13 urbanecm@deploy1003: urbanecm, cyndywikime: Backport for Growth: Configure higher impact module edit limits for english and test wiki (T341599) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:12 jmm@dns1004: END - running authdns-update
  • 13:11 jmm@dns1004: START - running authdns-update
  • 13:10 urbanecm@deploy1003: Started scap sync-world: Backport for Growth: Configure higher impact module edit limits for english and test wiki (T341599)
  • 12:59 XioNoX: setup BGP to Paylb on pfw1-eqiad - T397865
  • 12:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5006.eqsin.wmnet with OS bookworm
  • 12:57 jmm@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5006.eqsin.wmnet with reason: reimage
  • 12:53 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:51 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:49 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
  • 12:48 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
  • 12:45 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl1004.eqiad.wmnet
  • 12:39 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl1004.eqiad.wmnet
  • 12:39 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl1003.eqiad.wmnet
  • 12:38 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver1002.eqiad.wmnet
  • 12:38 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:38 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:35 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:34 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:34 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:33 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:32 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:32 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetserver1002.eqiad.wmnet
  • 12:31 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl1003.eqiad.wmnet
  • 12:31 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:31 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:29 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl1002.eqiad.wmnet
  • 12:23 jmm@dns1004: END - running authdns-update
  • 12:22 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:22 jmm@dns1004: START - running authdns-update
  • 12:21 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl1002.eqiad.wmnet
  • 12:21 moritzm: installing libcap2 security updates
  • 12:20 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
  • 12:15 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver2002.codfw.wmnet
  • 12:13 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl1001.eqiad.wmnet
  • 12:08 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetserver2002.codfw.wmnet
  • 12:07 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2005.codfw.wmnet
  • 12:02 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2005.codfw.wmnet
  • 12:00 moritzm: manually clean out external_cloud_vendors directory on puppet 5 frontends to fix Puppet runs
  • 11:59 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2004.codfw.wmnet
  • 11:54 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2004.codfw.wmnet
  • 11:53 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet
  • 11:47 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:47 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:46 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:45 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2003.codfw.wmnet
  • 11:45 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:43 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet
  • 11:37 jmm@dns1004: END - running authdns-update
  • 11:36 jmm@dns1004: START - running authdns-update
  • 11:35 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2002.codfw.wmnet
  • 11:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
  • 11:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
  • 11:08 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2005.codfw.wmnet
  • 11:01 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp2005.codfw.wmnet
  • 11:01 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet
  • 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2001.codfw.wmnet
  • 10:54 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
  • 10:50 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2001.codfw.wmnet
  • 10:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1006.eqiad.wmnet
  • 10:33 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup2007.codfw.wmnet: Renew puppet certificate - jynus@cumin1002
  • 10:32 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp1006.eqiad.wmnet
  • 10:27 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2050.codfw.wmnet to cluster codfw and group B
  • 10:26 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti2050.codfw.wmnet to cluster codfw and group B
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2050.codfw.wmnet
  • 10:19 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test2004.wikimedia.org
  • 10:19 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:18 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 10:17 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:17 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 10:17 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2050.codfw.wmnet
  • 10:11 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 10:08 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Switch to 10G (T378715)
  • 10:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool pc3 T378715', diff saved to https://phabricator.wikimedia.org/P78729 and previous config saved to /var/cache/conftool/dbconfig/20250701-100729-ladsgroup.json
  • 10:06 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts idp-test2004.wikimedia.org
  • 09:59 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2007.codfw.wmnet with reason: Maintenance and reboot
  • 09:57 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup2006.codfw.wmnet: Renew puppet certificate - jynus@cumin1002
  • 09:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
  • 09:49 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
  • 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
  • 09:33 hashar@deploy1003: Finished deploy [gerrit/gerrit@4e671a0]: Remove all references to patchdemo legacy - T391866 (duration: 00m 12s)
  • 09:32 hashar@deploy1003: Started deploy [gerrit/gerrit@4e671a0]: Remove all references to patchdemo legacy - T391866
  • 09:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
  • 09:25 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
  • 09:25 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
  • 09:21 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2006.codfw.wmnet with reason: Maintenance and reboot
  • 09:17 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup2005.codfw.wmnet: Renew puppet certificate - jynus@cumin1002
  • 09:11 kharlan@deploy1003: Finished scap sync-world: Backport for UserInfoCard: Fix opt-in to temporary account label display (T395661), UserInfoCard can unintentionally render information for more than one user (duration: 09m 15s)
  • 09:05 kharlan@deploy1003: kharlan: Continuing with sync
  • 09:04 kharlan@deploy1003: kharlan: Backport for UserInfoCard: Fix opt-in to temporary account label display (T395661), UserInfoCard can unintentionally render information for more than one user synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:02 kharlan@deploy1003: Started scap sync-world: Backport for UserInfoCard: Fix opt-in to temporary account label display (T395661), UserInfoCard can unintentionally render information for more than one user
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5005.eqsin.wmnet with OS bookworm
  • 08:55 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 08:55 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 08:54 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 08:53 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 08:44 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 08:44 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: Maintenance and reboot
  • 08:42 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup2004.codfw.wmnet: Renew puppet certificate - jynus@cumin1002
  • 08:38 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5005.eqsin.wmnet with reason: host reimage
  • 08:34 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 08:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5005.eqsin.wmnet with reason: host reimage
  • 08:12 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.8 refs T392178
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5005.eqsin.wmnet with OS bookworm
  • 08:08 moritzm: installing sudo security updates
  • 08:07 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2050.codfw.wmnet with OS bookworm
  • 07:58 urbanecm: Manually start a Growth cron job via `kubectl create job growthexperiments-deleteoldsurveys-$(date +"%Y%m%d%H%M") --from=cronjobs/growthexperiments-deleteoldsurveys` to verify whether a recent failure is permanent
  • 07:55 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Corvus out of all services on: 2396 hosts
  • 07:54 vgutierrez: switching upload@ulsfo to upload TLS certificate - T394484
  • 07:52 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2050.codfw.wmnet with reason: host reimage
  • 07:48 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2050.codfw.wmnet with reason: host reimage
  • 07:43 urbanecm@deploy1003: Finished scap sync-world: Backport for nlwiki: add VRT agent user group (T398216) (duration: 12m 04s)
  • 07:43 vgutierrez@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet
  • 07:43 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2004.codfw.wmnet with reason: Maintenance and reboot
  • 07:38 vgutierrez@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4045.ulsfo.wmnet
  • 07:38 urbanecm@deploy1003: urbanecm, daniuu: Continuing with sync
  • 07:37 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5005.eqsin.wmnet with reason: reimage
  • 07:36 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti2050.codfw.wmnet with OS bookworm
  • 07:33 urbanecm@deploy1003: urbanecm, daniuu: Backport for nlwiki: add VRT agent user group (T398216) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:31 urbanecm@deploy1003: Started scap sync-world: Backport for nlwiki: add VRT agent user group (T398216)
  • 07:16 kartik@deploy1003: Finished scap sync-world: Backport for Remove cxstats campaign (T393705) (duration: 14m 17s)
  • 07:09 kartik@deploy1003: kartik: Continuing with sync
  • 07:06 kartik@deploy1003: kartik: Backport for Remove cxstats campaign (T393705) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:02 kartik@deploy1003: Started scap sync-world: Backport for Remove cxstats campaign (T393705)
  • 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.5 (duration: 01m 38s)
  • 03:58 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.8 refs T392178 (duration: 55m 48s)
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.8 refs T392178
  • 02:13 ejegg: payments-wiki upgraded from 52f6940f to a92f03c3
  • 01:46 ejegg: fundraising civicrm upgraded from e35d3778 to 5ae93148
  • 00:20 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 00:19 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 00:19 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 00:18 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 00:01 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply

Archives

See Server Admin Log/Archives.