Jump to content

Server Admin Log/Archive 74

From Wikitech

2023-12-30

2023-12-29

  • 22:59 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:59 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:57 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:01 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:00 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:00 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:58 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:58 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:58 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:58 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:57 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:57 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:12 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:11 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:10 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:10 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:09 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:08 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:07 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:07 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:06 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:06 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:03 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:03 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:03 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:02 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:01 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:01 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply

2023-12-28

  • 23:59 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:59 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:58 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:57 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:57 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:52 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:51 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:50 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:48 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:47 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:47 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:46 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:46 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:46 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:45 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:35 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:35 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:20 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:20 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply

2023-12-27

  • 22:53 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:53 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:46 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:46 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:41 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:40 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply

2023-12-23

  • 20:22 _joe_: downgraded vopsbot on alert1001, hopefully should not keep panicing in this unexpected situation
  • 15:40 taavi: fix date-time on mw2448 (which thought it is the year 2098) by manually setting it once and then restarting systemd-timesyncd.service after bios was reset in T353679
  • 01:19 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 01:19 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.

2023-12-22

  • 17:28 krinkle@deploy2002: Synchronized php-1.42.0-wmf.10/includes/skins/Skin.php: Ice6d6c (duration: 06m 25s)
  • 15:16 jgiannelos@deploy2002: Finished deploy [restbase/deploy@5f2756a]: (no justification provided) (duration: 17m 36s)
  • 14:58 jgiannelos@deploy2002: Started deploy [restbase/deploy@5f2756a]: (no justification provided)
  • 14:57 jgiannelos@deploy2002: Finished deploy [restbase/deploy@f0c9f9f]: (no justification provided) (duration: 09m 32s)
  • 14:48 jgiannelos@deploy2002: Started deploy [restbase/deploy@f0c9f9f]: (no justification provided)
  • 14:01 jgiannelos@deploy2002: Finished deploy [restbase/deploy@4f56fff]: (no justification provided) (duration: 16m 57s)
  • 13:45 reedy@deploy2002: Finished scap: T353920 (duration: 08m 02s)
  • 13:44 jgiannelos@deploy2002: Started deploy [restbase/deploy@4f56fff]: (no justification provided)
  • 13:37 reedy@deploy2002: Started scap: T353920
  • 11:31 vgutierrez: upload golang-github-intel-go-cpuid_0.0~git20210602.5747e5c-2+deb12u1 to apt.wm.o (bookworm)
  • 10:42 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:42 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:39 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:57 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .

2023-12-21

  • 21:42 wfan: payment-wiki revision 1c96980a -> 3b281d10
  • 19:31 reedy@deploy2002: Synchronized wmf-config/CommonSettings.php: T346919 (duration: 06m 26s)
  • 19:14 dancy@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.10 refs T350086
  • 18:39 mutante: releases1003 - sudo chmod -R g+w /srv/org/wikimedia/releases/mediawiki/1.*
  • 17:26 mutante: mirror1001 - when syncing tails mirror - @ERROR: max connections (23) reached -- try again later
  • 17:23 mutante: [mirror1001:~] $ sudo systemctl start update-tails-mirror
  • 17:04 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:03 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:03 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:03 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:02 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:02 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:27 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 16:27 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 16:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 16:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 16:18 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 16:17 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 16:10 volans@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1008.eqiad.wmnet
  • 16:10 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 volans@cumin1002: START - Cookbook sre.dns.netbox
  • 16:03 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1008.eqiad.wmnet
  • 15:59 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:58 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:54 volans@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1007.eqiad.wmnet
  • 15:54 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:53 volans@cumin1002: START - Cookbook sre.dns.netbox
  • 15:47 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1007.eqiad.wmnet
  • 15:44 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:44 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:38 kharlan@deploy2002: Finished scap: Backport for Use username for lookup for non-existing user as the vague target (duration: 10m 37s)
  • 15:36 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:35 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:32 kharlan@deploy2002: kharlan and dreamyjazz: Continuing with sync
  • 15:30 kharlan@deploy2002: kharlan and dreamyjazz: Backport for Use username for lookup for non-existing user as the vague target synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:28 kharlan@deploy2002: Started scap: Backport for Use username for lookup for non-existing user as the vague target
  • 15:24 kharlan@deploy2002: Finished scap: Backport for Use username for lookup for non-existing user as the vague target (duration: 11m 38s)
  • 15:20 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:19 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:18 kharlan@deploy2002: kharlan and dreamyjazz: Continuing with sync
  • 15:15 kharlan@deploy2002: kharlan and dreamyjazz: Backport for Use username for lookup for non-existing user as the vague target synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:13 kharlan@deploy2002: Started scap: Backport for Use username for lookup for non-existing user as the vague target
  • 15:11 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:10 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:52 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Fix showing units and limits in NewPP limit report (T353793) (duration: 09m 27s)
  • 14:46 lucaswerkmeister-wmde@deploy2002: matmarex and lucaswerkmeister-wmde: Continuing with sync
  • 14:44 lucaswerkmeister-wmde@deploy2002: matmarex and lucaswerkmeister-wmde: Backport for Fix showing units and limits in NewPP limit report (T353793) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:43 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Fix showing units and limits in NewPP limit report (T353793)
  • 14:37 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:36 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:31 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:29 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 14:27 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Ignore "exact match" title when the title is not given (T353860) (duration: 08m 33s)
  • 14:21 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Continuing with sync
  • 14:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for Ignore "exact match" title when the title is not given (T353860) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:18 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Ignore "exact match" title when the title is not given (T353860)
  • 14:17 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes bdwikimedia --fix # T351903 – 62 pages to fix, 62 were resolvable. 56 links to fix, 54 were resolvable, 2 were deleted.
  • 14:15 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for uzwikipedia: add a temporary logo for the 20th anniversary (T353723) (duration: 09m 28s)
  • 14:13 moritzm: re-added Eoghan to pwstore
  • 14:09 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Continuing with sync
  • 14:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on 10 hosts with reason: T352878
  • 14:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on 10 hosts with reason: T352878
  • 14:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on 13 hosts with reason: T352878
  • 14:08 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Backport for uzwikipedia: add a temporary logo for the 20th anniversary (T353723) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:07 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on 13 hosts with reason: T352878
  • 14:06 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for uzwikipedia: add a temporary logo for the 20th anniversary (T353723)
  • 13:50 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:23 moritzm: installing libde265 security updates
  • 12:29 volans@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1006.eqiad.wmnet
  • 12:29 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:27 volans@cumin1002: START - Cookbook sre.dns.netbox
  • 12:20 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet
  • 12:18 volans@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1002.eqiad.wmnet with reason: Release v0.6.5 - volans@cumin1002
  • 12:14 volans@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin1002.eqiad.wmnet with reason: Release v0.6.5 - volans@cumin1002
  • 11:37 claime: Manually restarted cassandra-a service on restbase2028 following OOM - T353456
  • 11:23 volans@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wdqs1006.eqiad.wmnet
  • 11:22 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet
  • 11:16 volans@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wdqs1006.eqiad.wmnet
  • 11:13 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet
  • 10:42 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:42 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:29 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 09:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1006
  • 09:40 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1006
  • 08:59 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:54 apergos: UTC morning backport and config window done
  • 08:50 ariel@deploy2002: Finished scap: Backport for CommentFormatter: Do not add wrapper if the heading has attributes (T353489) (duration: 12m 39s)
  • 08:44 ariel@deploy2002: ariel and matmarex: Continuing with sync
  • 08:39 ariel@deploy2002: ariel and matmarex: Backport for CommentFormatter: Do not add wrapper if the heading has attributes (T353489) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:37 ariel@deploy2002: Started scap: Backport for CommentFormatter: Do not add wrapper if the heading has attributes (T353489)
  • 08:25 ariel@deploy2002: Finished scap: Backport for CommentFormatter: Do not add wrapper if the heading has attributes (T353489) (duration: 11m 07s)
  • 08:19 ariel@deploy2002: matmarex and ariel: Continuing with sync
  • 08:16 ariel@deploy2002: matmarex and ariel: Backport for CommentFormatter: Do not add wrapper if the heading has attributes (T353489) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:14 ariel@deploy2002: Started scap: Backport for CommentFormatter: Do not add wrapper if the heading has attributes (T353489)
  • 05:56 kart_: Updated MinT to 2023-12-20-071058-production
  • 05:50 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 05:42 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 05:40 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 05:35 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:29 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:26 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2075.codfw.wmnet with OS bullseye
  • 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage
  • 00:24 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage

2023-12-20

  • 23:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
  • 23:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2075.codfw.wmnet with OS bullseye
  • 23:24 ryankemper@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host netbox1002
  • 23:24 ryankemper@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host netbox1002
  • 23:19 ryankemper@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wdqs1006
  • 23:19 ryankemper@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1006
  • 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
  • 22:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs[1020-1021].eqiad.wmnet
  • 22:59 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for wdqs[1020-1021].eqiad.wmnet
  • 22:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on wdqs[1020-1024].eqiad.wmnet with reason: T352878
  • 22:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on wdqs[1020-1024].eqiad.wmnet with reason: T352878
  • 22:25 ryankemper@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs[1006-1008].eqiad.wmnet
  • 22:25 ryankemper@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:25 ryankemper@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1006-1008].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1002"
  • 22:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2080.codfw.wmnet with OS bullseye
  • 22:24 ryankemper@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1006-1008].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1002"
  • 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2079.codfw.wmnet with OS bullseye
  • 22:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2075']
  • 22:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2075']
  • 22:20 ryankemper@cumin1002: START - Cookbook sre.dns.netbox
  • 22:18 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
  • 22:18 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2077.codfw.wmnet with OS bullseye
  • 22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 22:17 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2078.codfw.wmnet with OS bullseye
  • 22:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2076.codfw.wmnet with OS bullseye
  • 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:16 ryankemper@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs[1006-1008].eqiad.wmnet
  • 22:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:13 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 22:12 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 22:10 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
  • 22:09 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 22:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:08 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
  • 22:08 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 22:06 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
  • 22:05 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 22:03 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2033.codfw.wmnet with OS bullseye
  • 22:03 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 22:02 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 21:59 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 21:59 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 21:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2080.codfw.wmnet with reason: host reimage
  • 21:59 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 21:59 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2079.codfw.wmnet with reason: host reimage
  • 21:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 21:56 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2077.codfw.wmnet with reason: host reimage
  • 21:54 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 21:54 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 21:53 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 21:53 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2076.codfw.wmnet with reason: host reimage
  • 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2080.codfw.wmnet with reason: host reimage
  • 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2079.codfw.wmnet with reason: host reimage
  • 21:48 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
  • 21:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 21:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2077.codfw.wmnet with reason: host reimage
  • 21:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
  • 21:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2076.codfw.wmnet with reason: host reimage
  • 21:48 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 21:47 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 21:47 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 21:46 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 21:45 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
  • 21:45 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lsw1-a8-codfw,lsw1-a8-codfw IPv6 with reason: testing commit confirm check in cookbook
  • 21:45 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lsw1-a8-codfw,lsw1-a8-codfw IPv6 with reason: testing commit confirm check in cookbook
  • 21:41 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 21:40 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 21:39 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 21:39 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 21:38 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 21:37 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 21:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2080.codfw.wmnet with OS bullseye
  • 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2079.codfw.wmnet with OS bullseye
  • 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS bullseye
  • 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2077.codfw.wmnet with OS bullseye
  • 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2076.codfw.wmnet with OS bullseye
  • 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
  • 21:30 dancy@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.10 refs T350086 (duration: 05m 57s)
  • 21:28 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
  • 21:26 eevans@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2033.codfw.wmnet with OS bullseye
  • 21:24 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.10 refs T350086
  • 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2074.codfw.wmnet with OS bullseye
  • 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:15 ladsgroup@deploy2002: Finished scap: Backport for Protect against ParserOutput re-namespacing (T353835) (duration: 08m 13s)
  • 21:09 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 21:08 ladsgroup@deploy2002: ladsgroup: Backport for Protect against ParserOutput re-namespacing (T353835) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:08 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
  • 21:07 ladsgroup@deploy2002: Started scap: Backport for Protect against ParserOutput re-namespacing (T353835)
  • 21:04 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
  • 21:02 aqu@deploy2002: Finished deploy [airflow-dags/research@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 28s)
  • 21:01 aqu@deploy2002: Started deploy [airflow-dags/research@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
  • 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2074.codfw.wmnet with reason: host reimage
  • 20:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2074.codfw.wmnet with reason: host reimage
  • 20:49 ladsgroup@deploy2002: Finished scap: Backport for Protect against ParserOutput re-namespacing (T353835) (duration: 08m 19s)
  • 20:47 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
  • 20:47 eevans@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2033.codfw.wmnet with OS bullseye
  • 20:43 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 20:42 ladsgroup@deploy2002: ladsgroup: Backport for Protect against ParserOutput re-namespacing (T353835) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:40 ladsgroup@deploy2002: Started scap: Backport for Protect against ParserOutput re-namespacing (T353835)
  • 20:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2074.codfw.wmnet with OS bullseye
  • 20:31 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
  • 20:30 eevans@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2033.codfw.wmnet with OS bullseye
  • 19:51 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
  • 19:48 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
  • 19:30 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
  • 19:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host wdqs1022.eqiad.wmnet
  • 19:27 dancy@deploy2002: Finished php-fpm-restarts
  • 19:24 dancy@deploy2002: Starting php-fpm-restarts
  • 19:18 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.10 refs T350086
  • 18:59 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 18:59 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 18:59 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 18:58 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 18:58 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 18:57 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 18:38 krinkle@deploy2002: Finished deploy [integration/docroot@355ddbb]: (no justification provided) (duration: 00m 07s)
  • 18:38 krinkle@deploy2002: Started deploy [integration/docroot@355ddbb]: (no justification provided)
  • 18:06 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
  • 18:06 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 18:05 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
  • 18:05 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 18:05 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
  • 18:05 cmooney@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 17:26 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1022.eqiad.wmnet
  • 17:25 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1022.eqiad.wmnet
  • 17:25 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1022.eqiad.wmnet
  • 17:05 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 16:03 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:03 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2080.codfw.wmnet with OS bullseye
  • 15:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2074.codfw.wmnet with OS bullseye
  • 15:18 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:17 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Replace $wgCommandLineMode checks with MW_ENTRY_POINT (T353751) (duration: 08m 22s)
  • 15:15 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts wdqs1022.eqiad.wmnet
  • 15:11 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Continuing with sync
  • 15:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for Replace $wgCommandLineMode checks with MW_ENTRY_POINT (T353751) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:09 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts wdqs1024.eqiad.wmnet
  • 15:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1024.eqiad.wmnet
  • 15:08 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Replace $wgCommandLineMode checks with MW_ENTRY_POINT (T353751)
  • 15:06 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1022.eqiad.wmnet
  • 15:05 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts wdqs1022.eqiad.wmnet
  • 15:05 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1022.eqiad.wmnet
  • 15:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1023.eqiad.wmnet
  • 15:05 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1023.eqiad.wmnet
  • 15:05 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:04 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:02 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1024.eqiad.wmnet
  • 15:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1024.eqiad.wmnet
  • 15:01 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1024.eqiad.wmnet
  • 15:01 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1024.eqiad.wmnet
  • 14:58 inflatador: bking@cumin2002 disable/mask wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-categories on wdqs102[24] T352878
  • 14:57 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for RunSingleJob.php: Fix use of MWExceptionHandler before it's defined (T352265) (duration: 10m 30s)
  • 14:51 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Continuing with sync
  • 14:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for RunSingleJob.php: Fix use of MWExceptionHandler before it's defined (T352265) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:46 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for RunSingleJob.php: Fix use of MWExceptionHandler before it's defined (T352265)
  • 14:43 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove BetaFeature code related to ReferencePreviews (T351708), Remove wgPopupsReferencePreviews now that it defaults to true (T351708) (duration: 10m 16s)
  • 14:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and awight and wmde-fisch: Continuing with sync
  • 14:35 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and awight and wmde-fisch: Backport for Remove BetaFeature code related to ReferencePreviews (T351708), Remove wgPopupsReferencePreviews now that it defaults to true (T351708) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:33 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove BetaFeature code related to ReferencePreviews (T351708), Remove wgPopupsReferencePreviews now that it defaults to true (T351708)
  • 14:30 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Check for false from ThumbnailImage::getStoragePath (T353758) (duration: 09m 38s)
  • 14:26 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Continuing with sync
  • 14:22 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Backport for Check for false from ThumbnailImage::getStoragePath (T353758) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:21 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Check for false from ThumbnailImage::getStoragePath (T353758)
  • 14:19 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Make wiktionary and mw.org provide og:site_name (T348203) (duration: 15m 54s)
  • 14:16 moritzm: installing distro-info-data updates from Bookworm point release
  • 14:14 lucaswerkmeister-wmde@deploy2002: pols12 and lucaswerkmeister-wmde: Continuing with sync
  • 14:12 moritzm: installing debootstrap bugfix updates from Bookworm point release
  • 14:06 lucaswerkmeister-wmde@deploy2002: pols12 and lucaswerkmeister-wmde: Backport for Make wiktionary and mw.org provide og:site_name (T348203) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:04 moritzm: installing cups updates from bookworm point release
  • 14:04 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Make wiktionary and mw.org provide og:site_name (T348203)
  • 13:38 aqu@deploy2002: Finished deploy [airflow-dags/wmde@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac513] (duration: 00m 05s)
  • 13:38 aqu@deploy2002: Started deploy [airflow-dags/wmde@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac513]
  • 13:38 aqu@deploy2002: Finished deploy [airflow-dags/search@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 30s)
  • 13:37 aqu@deploy2002: Started deploy [airflow-dags/search@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
  • 13:37 aqu@deploy2002: Finished deploy [airflow-dags/research@90f280e]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@e2ed6162] (duration: 00m 06s)
  • 13:37 aqu@deploy2002: Started deploy [airflow-dags/research@90f280e]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@e2ed6162]
  • 13:36 aqu@deploy2002: Finished deploy [airflow-dags/platform_eng@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 25s)
  • 13:36 aqu@deploy2002: Started deploy [airflow-dags/platform_eng@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
  • 13:35 aqu@deploy2002: Finished deploy [airflow-dags/analytics_product@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 09s)
  • 13:35 aqu@deploy2002: Started deploy [airflow-dags/analytics_product@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
  • 13:34 aqu@deploy2002: Finished deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 05s)
  • 13:34 aqu@deploy2002: Started deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
  • 13:34 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 11s)
  • 13:34 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
  • 13:32 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 01s)
  • 13:32 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
  • 13:31 aqu@deploy2002: Finished deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 01s)
  • 13:31 aqu@deploy2002: Started deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
  • 12:12 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 11:30 kostajh: T353703 Manual run: /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/mediamoderation.dblist extensions/MediaModeration/maintenance/updateMetrics.php --verbose
  • 10:22 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
  • 10:22 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
  • 09:43 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
  • 09:39 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh5002.wikimedia.org
  • 09:39 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for doh5002.wikimedia.org
  • 09:10 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh2001.wikimedia.org
  • 09:10 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for doh2001.wikimedia.org
  • 08:47 fabfur@cumin1001: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
  • 06:31 ryankemper: T351671 Pooled `wdqs10[17-21]*`; data xfers completed and test queries are passing on `wdqs1018`. Will decom related hosts tomorrow (2023-12-20)
  • 02:47 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 02:45 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 02:44 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 02:43 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 02:43 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 02:41 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 02:39 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 02:37 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 02:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 22:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
  • 00:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 22:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671

2023-12-19

  • 22:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
  • 22:26 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
  • 21:43 mforns@deploy2002: Finished deploy [airflow-dags/wmde@d5ac513]: (no justification provided) (duration: 00m 11s)
  • 21:43 mforns@deploy2002: Started deploy [airflow-dags/wmde@d5ac513]: (no justification provided)
  • 21:43 mforns@deploy2002: Finished deploy [airflow-dags/analytics@d5ac513]: (no justification provided) (duration: 00m 27s)
  • 21:43 mforns@deploy2002: Started deploy [airflow-dags/analytics@d5ac513]: (no justification provided)
  • 21:39 ladsgroup@deploy2002: Finished scap: Backport for Disable listings extension in more wikis (T253216) (duration: 07m 42s)
  • 21:33 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 21:32 ladsgroup@deploy2002: ladsgroup: Backport for Disable listings extension in more wikis (T253216) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:31 ladsgroup@deploy2002: Started scap: Backport for Disable listings extension in more wikis (T253216)
  • 21:26 kostajh: UTC late deploys done
  • 21:26 kharlan@deploy2002: Finished scap: Backport for Undeploy Annual Plan Core Metrics survey (T351353) (duration: 10m 00s)
  • 21:20 kharlan@deploy2002: kharlan and dani: Continuing with sync
  • 21:17 kharlan@deploy2002: kharlan and dani: Backport for Undeploy Annual Plan Core Metrics survey (T351353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:16 kharlan@deploy2002: Started scap: Backport for Undeploy Annual Plan Core Metrics survey (T351353)
  • 21:14 kharlan@deploy2002: Finished scap: Backport for MediaModeration: Add dblist (T353703) (duration: 07m 44s)
  • 21:08 kharlan@deploy2002: kharlan: Continuing with sync
  • 21:08 kharlan@deploy2002: kharlan: Backport for MediaModeration: Add dblist (T353703) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:06 kharlan@deploy2002: Started scap: Backport for MediaModeration: Add dblist (T353703)
  • 19:10 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.10 refs T350086
  • 18:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testhost2001.codfw.wmnet with OS bullseye
  • 18:56 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:49 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe] (duration: 00m 05s)
  • 18:48 mforns@deploy2002: Started deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe]
  • 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:43 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe] (duration: 03m 16s)
  • 18:39 mforns@deploy2002: Started deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe]
  • 18:39 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef] (thin): Regular analytics weekly train THIN [analytics/refinery@28dccefe] (duration: 00m 06s)
  • 18:39 mforns@deploy2002: Started deploy [analytics/refinery@28dccef] (thin): Regular analytics weekly train THIN [analytics/refinery@28dccefe]
  • 18:39 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef]: Regular analytics weekly train [analytics/refinery@28dccefe] (duration: 09m 18s)
  • 18:29 mforns@deploy2002: Started deploy [analytics/refinery@28dccef]: Regular analytics weekly train [analytics/refinery@28dccefe]
  • 18:29 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@d275e4f]: Deploy latest DAG changes to Analytics Airflow instance (duration: 00m 31s)
  • 18:28 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@d275e4f]: Deploy latest DAG changes to Analytics Airflow instance
  • 18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
  • 18:22 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
  • 18:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
  • 18:06 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host testhost2001.codfw.wmnet with OS bullseye
  • 17:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
  • 16:23 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:15 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on moss-be[2001-2003].codfw.wmnet with reason: not in service, being used to test a destructive cookbook
  • 16:12 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on moss-be[2001-2003].codfw.wmnet with reason: not in service, being used to test a destructive cookbook
  • 16:04 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 327700
  • 16:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 327700
  • 16:02 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 139901
  • 16:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 139901
  • 16:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
  • 15:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
  • 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5398
  • 15:55 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
  • 15:55 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
  • 15:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5398
  • 15:42 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Change virtual domain of botpassword to plural (T351559) (duration: 07m 01s)
  • 15:38 moritzm: installing gnutls28 security updates on bookworm
  • 15:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and ladsgroup: Continuing with sync
  • 15:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and ladsgroup: Backport for Change virtual domain of botpassword to plural (T351559) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:35 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Change virtual domain of botpassword to plural (T351559)
  • 15:33 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Use main replica DB in importExistingFilesToScanTable.php (duration: 07m 47s)
  • 15:27 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Continuing with sync
  • 15:27 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Backport for Use main replica DB in importExistingFilesToScanTable.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:25 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Use main replica DB in importExistingFilesToScanTable.php
  • 15:23 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: host is down, downtiming in icinga too
  • 15:23 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: host is down, downtiming in icinga too
  • 15:22 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Make SearchEntitiesIntegrationTest an ApiTestCase (T353334), Use link batch in search APIs (T353334) (duration: 08m 49s)
  • 15:16 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 15:15 moritzm: installing exim4 bugfix updates from Bookworm point release
  • 15:15 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Make SearchEntitiesIntegrationTest an ApiTestCase (T353334), Use link batch in search APIs (T353334) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:13 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Make SearchEntitiesIntegrationTest an ApiTestCase (T353334), Use link batch in search APIs (T353334)
  • 15:10 moritzm: installing nagios-plugins-contrib bugfix updates from Bookworm point release
  • 14:44 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:43 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:42 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:33 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 14:32 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 14:31 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 14:30 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 14:29 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 14:29 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 14:25 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Send PhotoDNA the mime type of the thumbnail and not original file (T351401), Add maintenance script to scan files in the mediamoderation_scan table (T351399) (duration: 07m 53s)
  • 14:24 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:24 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:24 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:24 kamila@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:22 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 14:21 kamila@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 14:21 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:19 kamila@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kharlan: Continuing with sync
  • 14:18 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kharlan: Backport for Send PhotoDNA the mime type of the thumbnail and not original file (T351401), Add maintenance script to scan files in the mediamoderation_scan table (T351399) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:17 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Send PhotoDNA the mime type of the thumbnail and not original file (T351401), Add maintenance script to scan files in the mediamoderation_scan table (T351399)
  • 14:15 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for testwiki: enable revertrisk model in ores extension (T348298) (duration: 10m 22s)
  • 14:10 lucaswerkmeister-wmde@deploy2002: isaranto and lucaswerkmeister-wmde: Continuing with sync
  • 14:08 lucaswerkmeister-wmde@deploy2002: isaranto and lucaswerkmeister-wmde: Backport for testwiki: enable revertrisk model in ores extension (T348298) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:05 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for testwiki: enable revertrisk model in ores extension (T348298)
  • 13:45 jgiannelos@deploy2002: Finished deploy [restbase/deploy@40c15b1]: (no justification provided) (duration: 27m 26s)
  • 13:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
  • 13:35 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
  • 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1001.eqiad.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
  • 13:32 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin1001.eqiad.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
  • 13:17 jgiannelos@deploy2002: Started deploy [restbase/deploy@40c15b1]: (no justification provided)
  • 13:12 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 13:08 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:05 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:05 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 12:24 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:24 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
  • 11:31 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:46 moritzm: installing perl security updates on bookworm
  • 10:19 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:14 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
  • 10:14 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
  • 09:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 09:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 09:23 elukey: reload thanos-rule on titan2001
  • 08:27 jmm@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lists1003.wikimedia.org
  • 08:27 jmm@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:27 jmm@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1002"
  • 08:26 jmm@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1002"
  • 08:22 jmm@cumin1002: START - Cookbook sre.dns.netbox
  • 08:17 jmm@cumin1002: START - Cookbook sre.hosts.decommission for hosts lists1003.wikimedia.org
  • 06:13 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:07 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:10 kart_: Updated MinT to 2023-12-12-065316-production
  • 04:56 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 04:54 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.10 refs T350086 (duration: 51m 03s)
  • 04:49 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 04:49 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 04:43 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 04:40 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 04:36 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 04:09 cstone: civicrm upgraded from e2d49d10 to c3cc80c7
  • 04:03 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.10 refs T350086

2023-12-18

  • 23:40 taavi: conftool codfw/appserver/nginx/mw2448.codfw.wmnet: pooled changed yes => inactive # T353679, not sure why it was not logged automatically
  • 22:35 maryum: Deployed patch for T347704
  • 22:08 dancy: UTC late backport window completed.
  • 22:07 dancy@deploy2002: Finished scap: Backport for Revert "Fix English Gboard backspace over aliens" (T353578 T325129), Revert "Put zero-width space after inline focusable nodes" (T353578 T330284), Update VE core submodule to wmf.9 (6bada65) (T353578 T330284 T325129) (duration: 13m 34s)
  • 21:57 dancy@deploy2002: dancy and kemayo: Continuing with sync
  • 21:56 dancy@deploy2002: dancy and kemayo: Backport for Revert "Fix English Gboard backspace over aliens" (T353578 T325129), Revert "Put zero-width space after inline focusable nodes" (T353578 T330284), Update VE core submodule to wmf.9 (6bada65) (T353578 T330284 T325129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:54 dancy@deploy2002: Started scap: Backport for Revert "Fix English Gboard backspace over aliens" (T353578 T325129), Revert "Put zero-width space after inline focusable nodes" (T353578 T330284), Update VE core submodule to wmf.9 (6bada65) (T353578 T330284 T325129)
  • 21:17 dancy@deploy2002: Finished scap: Backport for Undeploy Reader Demographics 2 survey (T344393) (duration: 08m 30s)
  • 21:11 dancy@deploy2002: dani and dancy: Continuing with sync
  • 21:10 dancy@deploy2002: dani and dancy: Backport for Undeploy Reader Demographics 2 survey (T344393) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:09 dancy@deploy2002: Started scap: Backport for Undeploy Reader Demographics 2 survey (T344393)
  • 21:05 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 21:05 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 21:04 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 21:04 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 21:03 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 21:03 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 21:01 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 21:01 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:53 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:53 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:52 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:52 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:48 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: Add message_key_fields to page_content_change stream (T338231) (duration: 06m 32s)
  • 20:31 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:31 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:19 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:19 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:14 inflatador: bking@kafka-jumbo1007 kafka topics --alter --topic codfw.mediawiki.cirrussearch.page_rerender.v1 --partitions 5 T351503
  • 17:12 inflatador: bking@kafka-jumbo1007 kafka topics --alter --topic eqiad.mediawiki.cirrussearch.page_rerender.v1 --partitions 5 T351503
  • 17:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2074.codfw.wmnet with OS bullseye
  • 16:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc-gp[12]00[123] - akosiaris@cumin1001"
  • 16:55 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc-gp[12]00[123] - akosiaris@cumin1001"
  • 16:54 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 28s)
  • 16:52 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 16:52 akosiaris@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 16:52 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 16:48 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 08s)
  • 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2076']
  • 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2075']
  • 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2074']
  • 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2079']
  • 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2077']
  • 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2080']
  • 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2078']
  • 16:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2080']
  • 16:35 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2080
  • 16:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc2042-mc2055 - akosiaris@cumin1001"
  • 16:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2080
  • 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1077
  • 16:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1077
  • 16:33 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc2042-mc2055 - akosiaris@cumin1001"
  • 16:31 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 16:28 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 16:28 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 16:25 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 16:25 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2080']
  • 16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2080']
  • 16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
  • 16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
  • 16:22 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2077']
  • 16:22 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2076']
  • 16:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2075']
  • 16:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2074']
  • 16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2079']
  • 16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2080']
  • 16:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
  • 16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2079']
  • 16:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2080']
  • 16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2078']
  • 16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2077']
  • 16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2076']
  • 16:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2076']
  • 16:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2075']
  • 16:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2074']
  • 16:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
  • 16:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
  • 16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
  • 16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2077']
  • 16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2075']
  • 16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2074']
  • 16:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2080.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2079.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2076.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2078.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2077.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2074.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2080.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2079.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2077.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2076.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2074.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2074-2080 to codfw - jhancock@cumin2002"
  • 15:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2074-2080 to codfw - jhancock@cumin2002"
  • 15:37 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 15:36 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 15:16 fabfur: repooling cp4037 (T352876)
  • 15:16 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp4037.ulsfo.wmnet
  • 15:16 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp4037.ulsfo.wmnet
  • 15:04 urbanecm@deploy2002: Finished scap: Backport for Configure and enable StatsLib for production (T343024), Revert "util.main: Don't use mw.Map(), use a native Map() instead" (T353571 T353076) (duration: 10m 20s)
  • 14:58 urbanecm@deploy2002: cwhite and urbanecm and chlod: Continuing with sync
  • 14:55 urbanecm@deploy2002: cwhite and urbanecm and chlod: Backport for Configure and enable StatsLib for production (T343024), Revert "util.main: Don't use mw.Map(), use a native Map() instead" (T353571 T353076) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:53 urbanecm@deploy2002: Started scap: Backport for Configure and enable StatsLib for production (T343024), Revert "util.main: Don't use mw.Map(), use a native Map() instead" (T353571 T353076)
  • 14:52 urbanecm@deploy2002: Finished scap: Backport for Enable action blocks for zhwiki (T353120) (duration: 08m 58s)
  • 14:47 urbanecm@deploy2002: milkydefer and urbanecm: Continuing with sync
  • 14:45 moritzm: installing nagios-plugins-contrib bugfix updates from Bookworm point release
  • 14:45 moritzm: installing nagios-plugins-contrib bugfix updates
  • 14:44 urbanecm@deploy2002: milkydefer and urbanecm: Backport for Enable action blocks for zhwiki (T353120) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:44 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@d275e4f]: (no justification provided) (duration: 00m 32s)
  • 14:44 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@d275e4f]: (no justification provided)
  • 14:43 urbanecm@deploy2002: Started scap: Backport for Enable action blocks for zhwiki (T353120)
  • 14:43 urbanecm@deploy2002: Finished scap: Backport for Add a testing stream for page-prediction-change events (T349919), CheckUser: Enable read new for event tables migration everywhere (T341829) (duration: 19m 00s)
  • 14:37 urbanecm@deploy2002: dreamyjazz and aikochou and urbanecm: Continuing with sync
  • 14:36 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 14:35 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 14:34 urbanecm@deploy2002: dreamyjazz and aikochou and urbanecm: Backport for Add a testing stream for page-prediction-change events (T349919), CheckUser: Enable read new for event tables migration everywhere (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:24 urbanecm@deploy2002: Started scap: Backport for Add a testing stream for page-prediction-change events (T349919), CheckUser: Enable read new for event tables migration everywhere (T341829)
  • 14:13 moritzm: installing node-undici security updates
  • 13:15 moritzm: installing intel-microcode security updates on buster hosts
  • 13:08 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
  • 12:56 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:55 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:52 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:51 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:50 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:50 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:45 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 12:41 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 12:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-canary
  • 12:26 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-canary
  • 12:26 kamila@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:25 kamila@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:24 kamila@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:23 kamila@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:20 kamila@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:20 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 12:20 kamila@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:19 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
  • 12:19 kamila@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 kamila@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:14 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:13 kamila@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:12 Emperor: restart swift-proxy and envoyproxy on ms-fe1012
  • 12:10 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:09 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:04 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:03 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:01 moritzm: installing ncurses security updates
  • 11:59 kamila@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:58 kamila@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:51 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 11:51 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
  • 11:41 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
  • 11:41 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
  • 11:39 moritzm: installing qemu security updates on bookworm
  • 11:38 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 11:37 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
  • 11:36 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 11:36 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
  • 10:56 moritzm: restarting apache/FPM on mw canaries to pick up gnutls update
  • 10:52 moritzm: installing gnutls28 security updates
  • 10:47 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 10:44 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 10:39 moritzm: installing jetty9 security updates
  • 10:29 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 10:29 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 10:17 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 10:13 XioNoX: remove VRRP pinning on cr1-eqiad/cr2-eqiad/cr2-codfw
  • 10:09 moritzm: installing Linux 6.1.67 updates on Bookworm hosts
  • 09:45 XioNoX: make eqiad-codfw 100G link primary
  • 09:10 vgutierrez: vgutierrez@acmechief1002:~$ sudo -i keyholder arm - T352242

2023-12-17

  • 12:59 elukey: restart kubelet on ml-serve1001 (errors while syncing old containers)

2023-12-16

  • 01:21 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@fb10de1]: (no justification provided) (duration: 00m 10s)
  • 01:21 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@fb10de1]: (no justification provided)
  • 00:44 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@63804c4]: (no justification provided) (duration: 00m 25s)
  • 00:44 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@63804c4]: (no justification provided)
  • 00:05 jhathaway: unbreaking my puppet change with, https://gerrit.wikimedia.org/r/c/operations/puppet/+/983504

2023-12-15

  • 23:46 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@9600237]: (no justification provided) (duration: 00m 27s)
  • 23:46 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@9600237]: (no justification provided)
  • 23:06 milimetric@deploy2002: Finished deploy [airflow-dags/platform_eng@160d0f0]: (no justification provided) (duration: 00m 25s)
  • 23:06 milimetric@deploy2002: Started deploy [airflow-dags/platform_eng@160d0f0]: (no justification provided)
  • 22:42 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:42 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:03 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@5090fdc]: (no justification provided) (duration: 00m 25s)
  • 22:03 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@5090fdc]: (no justification provided)
  • 21:48 milimetric@deploy2002: Finished deploy [analytics/refinery@eeb98ac] (thin): Syncing changes to HDFS (duration: 00m 06s)
  • 21:48 milimetric@deploy2002: Started deploy [analytics/refinery@eeb98ac] (thin): Syncing changes to HDFS
  • 21:48 milimetric@deploy2002: Finished deploy [analytics/refinery@eeb98ac]: Syncing changes to HDFS (duration: 81m 46s)
  • 21:26 mutante: running puppet on all prometheus*
  • 20:26 milimetric@deploy2002: Started deploy [analytics/refinery@eeb98ac]: Syncing changes to HDFS
  • 15:44 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 15:25 klausman@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 15:01 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:00 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 14:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54482 and previous config saved to /var/cache/conftool/dbconfig/20231215-144624-arnaudb.json
  • 14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 14:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 14:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 14:40 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:39 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54481 and previous config saved to /var/cache/conftool/dbconfig/20231215-143812-arnaudb.json
  • 14:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 80%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54480 and previous config saved to /var/cache/conftool/dbconfig/20231215-143118-arnaudb.json
  • 14:27 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 14:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on db2194.codfw.wmnet with reason: production freeze will occur before cookbook is finished
  • 14:27 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 14:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on db2194.codfw.wmnet with reason: production freeze will occur before cookbook is finished
  • 14:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54479 and previous config saved to /var/cache/conftool/dbconfig/20231215-142307-arnaudb.json
  • 14:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 40%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54478 and previous config saved to /var/cache/conftool/dbconfig/20231215-141613-arnaudb.json
  • 14:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54477 and previous config saved to /var/cache/conftool/dbconfig/20231215-140802-arnaudb.json
  • 14:07 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:07 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 14:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 20%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54476 and previous config saved to /var/cache/conftool/dbconfig/20231215-140108-arnaudb.json
  • 13:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 13:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54475 and previous config saved to /var/cache/conftool/dbconfig/20231215-135257-arnaudb.json
  • 13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'depool db2179 to repool w/ api', diff saved to https://phabricator.wikimedia.org/P54474 and previous config saved to /var/cache/conftool/dbconfig/20231215-135228-arnaudb.json
  • 13:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54473 and previous config saved to /var/cache/conftool/dbconfig/20231215-134603-arnaudb.json
  • 13:39 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Test upgrade GitLab Replica with insufficient API key
  • 13:39 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Test upgrade GitLab Replica with insufficient API key
  • 12:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 12:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 12:25 hashar@deploy2002: Finished deploy [integration/docroot@7f6c112]: doc: add integration/tox-jenkins-override - T353515 (duration: 00m 06s)
  • 12:25 hashar@deploy2002: Started deploy [integration/docroot@7f6c112]: doc: add integration/tox-jenkins-override - T353515
  • 11:28 hashar@deploy2002: Finished deploy [gerrit/gerrit@304c63a]: wm-pcc: only act on Puppet repositories - T353181 (duration: 00m 08s)
  • 11:28 hashar@deploy2002: Started deploy [gerrit/gerrit@304c63a]: wm-pcc: only act on Puppet repositories - T353181
  • 10:56 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 10:54 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 10:52 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 09:05 moritzm: installing Linux 6.1.67 packages on Bookworm hosts
  • 08:56 XioNoX: shutdown already down IPv6 BGP session from ulsfo to the office

2023-12-14

  • 23:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief1002.eqiad.wmnet with OS bookworm
  • 23:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief1002.eqiad.wmnet with reason: host reimage
  • 22:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief1002.eqiad.wmnet with reason: host reimage
  • 22:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host acmechief1002.eqiad.wmnet with OS bookworm
  • 21:24 ssastry@deploy2002: Finished scap: Backport for Revert "Temporarily disable isPreview in Parsoid's rendering" (duration: 10m 38s)
  • 21:18 ssastry@deploy2002: ssastry: Continuing with sync
  • 21:14 ssastry@deploy2002: ssastry: Backport for Revert "Temporarily disable isPreview in Parsoid's rendering" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:13 ssastry@deploy2002: Started scap: Backport for Revert "Temporarily disable isPreview in Parsoid's rendering"
  • 20:52 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 20:51 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 20:51 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:51 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:51 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 20:50 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 20:50 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 20:50 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 20:50 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 20:49 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 20:48 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 20:48 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 20:48 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:47 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:47 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 20:46 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 20:46 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 20:46 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 20:45 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 20:45 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 20:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs[1009-1010].eqiad.wmnet
  • 20:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 20:40 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 20:39 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 20:39 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:39 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:39 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 20:38 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 20:38 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 20:38 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 20:38 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 20:37 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 20:37 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 20:31 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 20:23 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs[1009-1010].eqiad.wmnet
  • 20:06 jmm@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM moscovium.eqiad.wmnet
  • 20:02 jmm@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM moscovium.eqiad.wmnet
  • 19:12 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.9 refs T350085
  • 19:03 brennen: 1.42.0-wmf.9 (T350085) status: no current blockers, although we should keep an eye on T353400. rolling to all wikis.
  • 18:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54462 and previous config saved to /var/cache/conftool/dbconfig/20231214-183508-arnaudb.json
  • 18:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54461 and previous config saved to /var/cache/conftool/dbconfig/20231214-183459-arnaudb.json
  • 18:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54460 and previous config saved to /var/cache/conftool/dbconfig/20231214-182003-arnaudb.json
  • 18:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54459 and previous config saved to /var/cache/conftool/dbconfig/20231214-181954-arnaudb.json
  • 18:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54458 and previous config saved to /var/cache/conftool/dbconfig/20231214-180458-arnaudb.json
  • 18:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54457 and previous config saved to /var/cache/conftool/dbconfig/20231214-180449-arnaudb.json
  • 17:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54456 and previous config saved to /var/cache/conftool/dbconfig/20231214-174953-arnaudb.json
  • 17:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54455 and previous config saved to /var/cache/conftool/dbconfig/20231214-174944-arnaudb.json
  • 17:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54453 and previous config saved to /var/cache/conftool/dbconfig/20231214-173448-arnaudb.json
  • 17:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54452 and previous config saved to /var/cache/conftool/dbconfig/20231214-173439-arnaudb.json
  • 17:24 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:23 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54451 and previous config saved to /var/cache/conftool/dbconfig/20231214-171943-arnaudb.json
  • 17:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54450 and previous config saved to /var/cache/conftool/dbconfig/20231214-171934-arnaudb.json
  • 17:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 8%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54449 and previous config saved to /var/cache/conftool/dbconfig/20231214-170438-arnaudb.json
  • 17:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 8%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54448 and previous config saved to /var/cache/conftool/dbconfig/20231214-170428-arnaudb.json
  • 16:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 4%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54446 and previous config saved to /var/cache/conftool/dbconfig/20231214-164925-arnaudb.json
  • 16:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 4%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54445 and previous config saved to /var/cache/conftool/dbconfig/20231214-164921-arnaudb.json
  • 16:43 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:43 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:43 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:42 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:42 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 2%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54444 and previous config saved to /var/cache/conftool/dbconfig/20231214-163420-arnaudb.json
  • 16:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 2%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54443 and previous config saved to /var/cache/conftool/dbconfig/20231214-163416-arnaudb.json
  • 16:24 akosiaris: updates of all wikikube services done T352906
  • 16:20 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:18 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:18 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 16:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:17 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:17 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/similar-users: apply
  • 16:17 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/similar-users: apply
  • 16:17 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/similar-users: apply
  • 16:16 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/similar-users: apply
  • 16:16 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
  • 16:16 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
  • 16:15 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 16:15 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:14 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:14 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 16:14 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:14 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:14 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:13 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 16:13 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 16:13 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:12 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:12 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 16:12 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 16:11 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 16:10 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 16:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 16:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:09 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 16:09 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 16:09 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:08 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 16:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 16:08 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 16:08 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:07 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 16:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 16:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 16:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 16:06 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 16:06 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 16:06 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 16:06 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 16:05 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 16:05 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:05 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:05 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 16:04 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 16:04 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
  • 16:04 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
  • 16:04 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
  • 16:03 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 16:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 16:03 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
  • 16:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 16:03 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:02 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 16:02 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:02 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:02 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:02 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:02 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:01 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:01 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 16:01 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 16:00 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:00 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 15:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 15:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 15:59 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 15:59 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 15:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 15:58 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 15:57 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 15:57 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 15:57 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:57 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:57 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 15:57 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 15:57 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 15:56 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 15:56 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 15:55 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 15:54 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 15:54 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet2002.codfw.wmnet
  • 15:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:54 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
  • 15:54 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 15:54 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 15:54 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 15:54 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 15:54 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 15:53 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 15:53 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 15:53 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 15:53 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 15:53 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 15:53 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 15:53 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
  • 15:53 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 15:52 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 15:52 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 15:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 15:51 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 15:51 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 15:51 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 15:51 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 15:51 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 15:51 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 15:51 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 15:50 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 15:50 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 15:50 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 15:50 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 15:50 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 15:50 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 15:50 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 15:50 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 15:49 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 15:49 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 15:49 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 15:49 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 15:49 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 15:49 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 15:48 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 15:48 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 15:48 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 15:48 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 15:48 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 15:48 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 15:46 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 15:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1149.eqiad.wmnet onto db1249.eqiad.wmnet
  • 15:42 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:42 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 15:42 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 15:42 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:42 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 15:42 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts planet2002.codfw.wmnet
  • 15:40 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:40 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:40 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:40 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:40 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:35 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 15:35 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:31 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@4946bb7]: (no justification provided) (duration: 00m 48s)
  • 15:30 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@4946bb7]: (no justification provided)
  • 15:29 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:28 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:28 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:28 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:28 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:28 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:28 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 15:28 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 15:28 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:28 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:28 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 15:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 15:27 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:27 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:27 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:17 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 15:17 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 15:16 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:15 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 14:46 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:45 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:45 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:45 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:44 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:44 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:43 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:43 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:22 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:22 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:07 moritzm: installing ruby-rails-html-sanitizer security updates
  • 14:01 moritzm: installing ruby-loofah security updates
  • 13:56 moritzm: installing reportbug bugfix updates on buster
  • 13:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1137.eqiad.wmnet
  • 13:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:54 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:53 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 13:52 moritzm: installing netty security updates
  • 13:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1148.eqiad.wmnet
  • 13:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:51 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 13:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1132.eqiad.wmnet
  • 13:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1132.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 13:50 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1132.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 13:48 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 13:44 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1148.eqiad.wmnet
  • 13:43 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1137.eqiad.wmnet
  • 13:42 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1132.eqiad.wmnet
  • 13:42 arnaudb@cumin1001: dbctl commit (dc=all): 'decommissionning hosts', diff saved to https://phabricator.wikimedia.org/P54437 and previous config saved to /var/cache/conftool/dbconfig/20231214-134203-arnaudb.json
  • 13:21 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1134.eqiad.wmnet onto db1234.eqiad.wmnet
  • 13:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1134 in db1234 for T344036', diff saved to https://phabricator.wikimedia.org/P54436 and previous config saved to /var/cache/conftool/dbconfig/20231214-131913-arnaudb.json
  • 13:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
  • 13:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
  • 13:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
  • 13:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
  • 13:12 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1149.eqiad.wmnet onto db1249.eqiad.wmnet
  • 13:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1149 in db1249 for T344036', diff saved to https://phabricator.wikimedia.org/P54435 and previous config saved to /var/cache/conftool/dbconfig/20231214-131017-arnaudb.json
  • 13:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
  • 13:09 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
  • 13:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
  • 13:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
  • 12:45 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 12:45 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 12:42 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 12:10 cgoubert@deploy2002: Finished scap: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841 (duration: 04m 16s)
  • 12:05 cgoubert@deploy2002: Started scap: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841
  • 12:03 cgoubert@deploy2002: sync-world aborted: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841 (duration: 00m 02s)
  • 12:03 cgoubert@deploy2002: Started scap: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841
  • 12:01 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 12:01 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 11:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54434 and previous config saved to /var/cache/conftool/dbconfig/20231214-115332-arnaudb.json
  • 11:51 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:49 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1002.eqiad.wmnet
  • 11:42 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1002.eqiad.wmnet
  • 11:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54433 and previous config saved to /var/cache/conftool/dbconfig/20231214-113826-arnaudb.json
  • 11:31 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1001.eqiad.wmnet
  • 11:30 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
  • 11:25 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1001.eqiad.wmnet
  • 11:24 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
  • 11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54432 and previous config saved to /var/cache/conftool/dbconfig/20231214-112321-arnaudb.json
  • 11:12 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 11:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54431 and previous config saved to /var/cache/conftool/dbconfig/20231214-110816-arnaudb.json
  • 11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54430 and previous config saved to /var/cache/conftool/dbconfig/20231214-110754-arnaudb.json
  • 11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54429 and previous config saved to /var/cache/conftool/dbconfig/20231214-110733-arnaudb.json
  • 11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54428 and previous config saved to /var/cache/conftool/dbconfig/20231214-110714-arnaudb.json
  • 11:06 _joe_: restarted apache2 on lists1001
  • 10:58 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 100%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54427 and previous config saved to /var/cache/conftool/dbconfig/20231214-105814-arnaudb.json
  • 10:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54426 and previous config saved to /var/cache/conftool/dbconfig/20231214-105311-arnaudb.json
  • 10:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54425 and previous config saved to /var/cache/conftool/dbconfig/20231214-105248-arnaudb.json
  • 10:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54424 and previous config saved to /var/cache/conftool/dbconfig/20231214-105228-arnaudb.json
  • 10:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54423 and previous config saved to /var/cache/conftool/dbconfig/20231214-105209-arnaudb.json
  • 10:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update codfw-eqiad transport ptr - ayounsi@cumin1001"
  • 10:45 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update codfw-eqiad transport ptr - ayounsi@cumin1001"
  • 10:43 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 90%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54422 and previous config saved to /var/cache/conftool/dbconfig/20231214-104308-arnaudb.json
  • 10:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 10:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 15%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54421 and previous config saved to /var/cache/conftool/dbconfig/20231214-103806-arnaudb.json
  • 10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54420 and previous config saved to /var/cache/conftool/dbconfig/20231214-103743-arnaudb.json
  • 10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54419 and previous config saved to /var/cache/conftool/dbconfig/20231214-103723-arnaudb.json
  • 10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54418 and previous config saved to /var/cache/conftool/dbconfig/20231214-103704-arnaudb.json
  • 10:28 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 80%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54417 and previous config saved to /var/cache/conftool/dbconfig/20231214-102803-arnaudb.json
  • 10:26 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 10:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54416 and previous config saved to /var/cache/conftool/dbconfig/20231214-102301-arnaudb.json
  • 10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54415 and previous config saved to /var/cache/conftool/dbconfig/20231214-102238-arnaudb.json
  • 10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54414 and previous config saved to /var/cache/conftool/dbconfig/20231214-102218-arnaudb.json
  • 10:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54413 and previous config saved to /var/cache/conftool/dbconfig/20231214-102159-arnaudb.json
  • 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new cumin1002 host - jmm@cumin2002"
  • 10:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new cumin1002 host - jmm@cumin2002"
  • 10:14 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 10:14 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 10:14 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 10:14 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 10:14 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 10:13 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 10:12 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 70%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54412 and previous config saved to /var/cache/conftool/dbconfig/20231214-101258-arnaudb.json
  • 10:12 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 10:12 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 10:11 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 10:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 10:11 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 10:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 10:08 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54411 and previous config saved to /var/cache/conftool/dbconfig/20231214-100756-arnaudb.json
  • 10:07 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 10:07 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54410 and previous config saved to /var/cache/conftool/dbconfig/20231214-100733-arnaudb.json
  • 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54409 and previous config saved to /var/cache/conftool/dbconfig/20231214-100713-arnaudb.json
  • 10:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 10:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54408 and previous config saved to /var/cache/conftool/dbconfig/20231214-100654-arnaudb.json
  • 10:06 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 10:05 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 10:05 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 10:05 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 10:04 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 10:04 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 10:04 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 10:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 09:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 09:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 09:58 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 09:58 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 09:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 09:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 60%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54407 and previous config saved to /var/cache/conftool/dbconfig/20231214-095753-arnaudb.json
  • 09:56 godog: remove >= 3 months old thanos blocks for prometheus/ops in eqiad/codfw and only for a single replica - T351927
  • 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54406 and previous config saved to /var/cache/conftool/dbconfig/20231214-095228-arnaudb.json
  • 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54405 and previous config saved to /var/cache/conftool/dbconfig/20231214-095208-arnaudb.json
  • 09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54404 and previous config saved to /var/cache/conftool/dbconfig/20231214-095149-arnaudb.json
  • 09:51 hashar: Restarting CI Jenkins
  • 09:49 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 09:49 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 09:49 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 09:49 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 09:49 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 09:48 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 09:42 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 50%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54402 and previous config saved to /var/cache/conftool/dbconfig/20231214-094248-arnaudb.json
  • 09:40 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 09:39 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 09:39 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 09:38 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 09:38 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 09:38 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cumin1002.eqiad.wmnet with OS bullseye
  • 09:30 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 09:27 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 40%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54401 and previous config saved to /var/cache/conftool/dbconfig/20231214-092743-arnaudb.json
  • 09:27 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 09:27 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin1002.eqiad.wmnet with reason: host reimage
  • 09:25 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 09:24 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 09:24 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 09:24 akosiaris: update all the other services. T352906
  • 09:24 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 09:24 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 09:24 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 09:22 godog: delete raw replica blocks for prometheus/ops (only one replica) in codfw - T351927
  • 09:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin1002.eqiad.wmnet with reason: host reimage
  • 09:21 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 09:20 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 09:20 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 09:20 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 09:20 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 09:19 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 09:12 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 30%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54400 and previous config saved to /var/cache/conftool/dbconfig/20231214-091238-arnaudb.json
  • 09:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1002.eqiad.wmnet with OS bullseye
  • 09:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host cumin1002.eqiad.wmnet
  • 09:10 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cumin1002.eqiad.wmnet with OS bullseye
  • 09:10 apergos: UTC morning backport and config window done
  • 09:09 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 09:08 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 09:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 09:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1182.eqiad.wmnet onto db1233.eqiad.wmnet
  • 09:07 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
  • 09:06 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 09:06 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 09:06 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 09:03 ariel@deploy2002: Finished scap: Backport for RunSingleJob.php: Stop writing to $wgCommandLineMode (T353262) (duration: 09m 05s)
  • 09:00 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
  • 08:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 20%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54399 and previous config saved to /var/cache/conftool/dbconfig/20231214-085733-arnaudb.json
  • 08:56 ariel@deploy2002: ariel and matmarex: Continuing with sync
  • 08:56 ariel@deploy2002: ariel and matmarex: Backport for RunSingleJob.php: Stop writing to $wgCommandLineMode (T353262) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:54 ariel@deploy2002: Started scap: Backport for RunSingleJob.php: Stop writing to $wgCommandLineMode (T353262)
  • 08:47 ariel@deploy2002: Finished scap: Backport for RunSingleJob.php: Remove overly complicated error handling (T353262) (duration: 08m 39s)
  • 08:42 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 10%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54398 and previous config saved to /var/cache/conftool/dbconfig/20231214-084228-arnaudb.json
  • 08:40 ariel@deploy2002: matmarex and ariel: Continuing with sync
  • 08:39 ariel@deploy2002: matmarex and ariel: Backport for RunSingleJob.php: Remove overly complicated error handling (T353262) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:38 ariel@deploy2002: Started scap: Backport for RunSingleJob.php: Remove overly complicated error handling (T353262)
  • 08:35 ariel@deploy2002: Finished scap: Backport for Remove references to refreshMessageBlobs.php (T314947) (duration: 10m 20s)
  • 08:34 XioNoX: drain eqiad-codfw Arelion link for 100G migration
  • 08:27 ariel@deploy2002: ariel and matmarex: Continuing with sync
  • 08:26 ariel@deploy2002: ariel and matmarex: Backport for Remove references to refreshMessageBlobs.php (T314947) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:24 ariel@deploy2002: Started scap: Backport for Remove references to refreshMessageBlobs.php (T314947)
  • 08:20 ariel@deploy2002: Finished scap: Backport for use virtual db domain for CentralAuth and GlobalBlocking (T348486) (duration: 10m 33s)
  • 08:13 ariel@deploy2002: ariel: Continuing with sync
  • 08:11 ariel@deploy2002: ariel: Backport for use virtual db domain for CentralAuth and GlobalBlocking (T348486) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:10 ariel@deploy2002: Started scap: Backport for use virtual db domain for CentralAuth and GlobalBlocking (T348486)
  • 08:08 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:02 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1002.eqiad.wmnet with OS bullseye
  • 08:01 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:00 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cumin1002.eqiad.wmnet - jmm@cumin2002"
  • 07:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cumin1002.eqiad.wmnet - jmm@cumin2002"
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cumin1002.eqiad.wmnet on all recursors
  • 07:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache cumin1002.eqiad.wmnet on all recursors
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cumin1002.eqiad.wmnet - jmm@cumin2002"
  • 07:53 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cumin1002.eqiad.wmnet - jmm@cumin2002"
  • 07:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host cumin1002.eqiad.wmnet
  • 07:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 07:49 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 07:48 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1182.eqiad.wmnet onto db1233.eqiad.wmnet
  • 07:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 03:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bullseye
  • 03:06 bvibber: cleanupOrphanedTranscodes complete. requeueTranscodes continues... forever and ever and ever
  • 02:54 bvibber: brion running cleanupOrphanedTranscodes on commonswiki on mwmaint2002
  • 01:26 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
  • 01:25 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
  • 01:04 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
  • 01:04 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
  • 00:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
  • 00:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 00:40 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 00:38 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 00:38 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 00:34 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=93) on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 00:34 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 00:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet1002.eqiad.wmnet
  • 00:18 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:18 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
  • 00:17 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
  • 00:15 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 00:11 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts planet1002.eqiad.wmnet

2023-12-13

  • 23:48 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host acmechief1002.eqiad.wmnet
  • 23:48 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host acmechief1002.eqiad.wmnet with OS bookworm
  • 23:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
  • 23:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1107.eqiad.wmnet with OS bookworm
  • 23:17 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply new extra plugins - bking@cumin2002 - T353270
  • 23:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1107.eqiad.wmnet with reason: host reimage
  • 23:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1107.eqiad.wmnet with reason: host reimage
  • 22:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 22:58 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 22:57 jhuneidi@deploy2002: Finished scap: Backport for Update wgStatsTarget to port 9125 (T240685), [BC] Enable desktop diff and history pages on mobile (T350181 T353388) (duration: 09m 42s)
  • 22:57 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 22:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1005.eqiad.wmnet with OS bullseye
  • 22:54 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 22:53 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 22:50 jhuneidi@deploy2002: jhuneidi and jdlrobson and cwhite: Continuing with sync
  • 22:49 jhuneidi@deploy2002: jhuneidi and jdlrobson and cwhite: Backport for Update wgStatsTarget to port 9125 (T240685), [BC] Enable desktop diff and history pages on mobile (T350181 T353388) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:48 jhuneidi@deploy2002: Started scap: Backport for Update wgStatsTarget to port 9125 (T240685), [BC] Enable desktop diff and history pages on mobile (T350181 T353388)
  • 22:47 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
  • 22:45 jhuneidi@deploy2002: Finished scap: Backport for tests: Use MediaWikiIntegrationTestCase::setGroupPermissions (T353210), Temporarily disable isPreview in Parsoid's rendering (duration: 10m 08s)
  • 22:45 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
  • 22:45 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
  • 22:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
  • 22:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1004.eqiad.wmnet with OS bullseye
  • 22:40 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 22:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1005.eqiad.wmnet with reason: host reimage
  • 22:39 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 22:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
  • 22:38 jhuneidi@deploy2002: ssastry and jhuneidi: Continuing with sync
  • 22:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
  • 22:37 jhuneidi@deploy2002: ssastry and jhuneidi: Backport for tests: Use MediaWikiIntegrationTestCase::setGroupPermissions (T353210), Temporarily disable isPreview in Parsoid's rendering synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
  • 22:36 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1005.eqiad.wmnet with reason: host reimage
  • 22:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host acmechief1002.eqiad.wmnet with OS bookworm
  • 22:35 jhuneidi@deploy2002: Started scap: Backport for tests: Use MediaWikiIntegrationTestCase::setGroupPermissions (T353210), Temporarily disable isPreview in Parsoid's rendering
  • 22:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1004.eqiad.wmnet with reason: host reimage
  • 22:24 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 22:24 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1005.eqiad.wmnet with OS bullseye
  • 22:22 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1004.eqiad.wmnet with reason: host reimage
  • 22:18 jhuneidi@deploy2002: Finished scap: Backport for Partially undeploy Reader Demographics 2 survey (T344393), Enable $wgStatsTarget for requests to mwdebug (T240685) (duration: 12m 33s)
  • 22:18 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief1002.eqiad.wmnet - brett@cumin2002"
  • 22:17 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief1002.eqiad.wmnet - brett@cumin2002"
  • 22:16 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) acmechief1002.eqiad.wmnet on all recursors
  • 22:16 brett@cumin2002: START - Cookbook sre.dns.wipe-cache acmechief1002.eqiad.wmnet on all recursors
  • 22:16 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:16 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief1002.eqiad.wmnet - brett@cumin2002"
  • 22:15 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief1002.eqiad.wmnet - brett@cumin2002"
  • 22:12 brett@cumin2002: START - Cookbook sre.dns.netbox
  • 22:11 brett@cumin2002: START - Cookbook sre.ganeti.makevm for new host acmechief1002.eqiad.wmnet
  • 22:11 jhuneidi@deploy2002: dani and jhuneidi and cwhite: Continuing with sync
  • 22:09 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1004.eqiad.wmnet with OS bullseye
  • 22:07 jhuneidi@deploy2002: dani and jhuneidi and cwhite: Backport for Partially undeploy Reader Demographics 2 survey (T344393), Enable $wgStatsTarget for requests to mwdebug (T240685) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:05 jhuneidi@deploy2002: Started scap: Backport for Partially undeploy Reader Demographics 2 survey (T344393), Enable $wgStatsTarget for requests to mwdebug (T240685)
  • 22:01 jhuneidi@deploy2002: Finished scap: Backport for Restore fixed width and height, direction of arrow on change list pages (T352456 T353099) (duration: 10m 28s)
  • 21:59 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply new extra plugins - bking@cumin2002 - T353270
  • 21:54 jhuneidi@deploy2002: jhuneidi and jdlrobson: Continuing with sync
  • 21:52 jhuneidi@deploy2002: jhuneidi and jdlrobson: Backport for Restore fixed width and height, direction of arrow on change list pages (T352456 T353099) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:50 jhuneidi@deploy2002: Started scap: Backport for Restore fixed width and height, direction of arrow on change list pages (T352456 T353099)
  • 21:04 cstone: civicrm upgraded from 834606ef to e2d49d10
  • 20:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts planet1002.eqiad.wmnet
  • 20:33 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 20:28 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts planet1002.eqiad.wmnet
  • 19:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2031.codfw.wmnet
  • 19:31 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2031.codfw.wmnet
  • 19:19 brennen@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.9 refs T350085 (duration: 07m 29s)
  • 19:12 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.9 refs T350085
  • 19:03 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply new extra plugins - bking@cumin2002 - T353270
  • 19:01 brennen: 1.42.0-wmf.9 (T350085) status: no blockers, rolling to group1
  • 18:07 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:07 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:06 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 18:05 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 17:58 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:57 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:44 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply new extra plugins - bking@cumin2002 - T353270
  • 17:27 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:25 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:56 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore1006']
  • 16:56 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore1005']
  • 16:56 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore1004']
  • 16:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore1006']
  • 16:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore1005']
  • 16:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore1004']
  • 16:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1148.eqiad.wmnet onto db1248.eqiad.wmnet
  • 16:39 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:39 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new extra plugins - bking@cumin2002 - T353270
  • 16:38 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:38 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54395 and previous config saved to /var/cache/conftool/dbconfig/20231213-163657-arnaudb.json
  • 16:36 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:36 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:36 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:35 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:35 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:34 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:31 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:30 vriley@cumin1001: START - Cookbook sre.hosts.provision for host sessionstore1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:29 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sessionstore1006
  • 16:28 vriley@cumin1001: START - Cookbook sre.hosts.provision for host sessionstore1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:27 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sessionstore1006
  • 16:27 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 16:26 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sessionstore1005
  • 16:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 16:25 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sessionstore1005
  • 16:23 vriley@cumin1001: START - Cookbook sre.hosts.provision for host sessionstore1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:22 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sessionstore1004
  • 16:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 90%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54394 and previous config saved to /var/cache/conftool/dbconfig/20231213-162152-arnaudb.json
  • 16:20 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sessionstore1004
  • 16:19 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 16:19 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 16:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 16:16 ladsgroup@deploy2002: Finished scap: Backport for Fix my email in the key list (duration: 08m 45s)
  • 16:15 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:15 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 16:14 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new extra plugins - bking@cumin2002 - T353270
  • 16:13 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:12 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 16:12 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:11 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 16:10 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 16:09 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 16:09 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 16:09 ladsgroup@deploy2002: ladsgroup: Backport for Fix my email in the key list synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:08 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:07 ladsgroup@deploy2002: Started scap: Backport for Fix my email in the key list
  • 16:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 80%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54393 and previous config saved to /var/cache/conftool/dbconfig/20231213-160647-arnaudb.json
  • 16:05 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:05 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 16:04 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 16:04 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 16:04 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:04 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 16:04 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 16:03 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 16:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:03 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:01 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:01 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:01 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 16:01 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 16:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 16:00 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 16:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 16:00 akosiaris: upgrade apertium, bluebberoid everywhere to use the latest service_proxy image, 1.23.10-2-s4-20231203 T352906
  • 15:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 15:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 15:59 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 15:59 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 15:59 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 15:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 15:58 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1132.eqiad.wmnet onto db1232.eqiad.wmnet
  • 15:58 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 15:57 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 15:56 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:56 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:51 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 15:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 70%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54392 and previous config saved to /var/cache/conftool/dbconfig/20231213-155142-arnaudb.json
  • 15:51 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:51 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 15:50 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 15:49 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 15:46 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 15:45 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 15:44 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 15:43 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 15:40 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 15:39 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 15:39 claime: Deploying shellbox: update php-fpm-exporter version - 982432
  • 15:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 60%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54389 and previous config saved to /var/cache/conftool/dbconfig/20231213-153636-arnaudb.json
  • 15:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1147.eqiad.wmnet
  • 15:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1147.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:35 Amir1: tagging 1.41.0-rc.0 in core
  • 15:35 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1147.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:34 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:33 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:28 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1147.eqiad.wmnet
  • 15:25 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1129.eqiad.wmnet
  • 15:25 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:25 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1129.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:24 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1129.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:21 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 50%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54387 and previous config saved to /var/cache/conftool/dbconfig/20231213-152131-arnaudb.json
  • 15:17 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
  • 15:16 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1129.eqiad.wmnet
  • 15:15 ladsgroup@deploy2002: Finished scap: Backport for docroot: Add my pgp key (duration: 09m 50s)
  • 15:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1128.eqiad.wmnet
  • 15:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1128.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:12 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1128.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:10 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:09 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 15:07 ladsgroup@deploy2002: ladsgroup: Backport for docroot: Add my pgp key synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 40%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54386 and previous config saved to /var/cache/conftool/dbconfig/20231213-150626-arnaudb.json
  • 15:06 ladsgroup@deploy2002: Started scap: Backport for docroot: Add my pgp key
  • 15:05 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1128.eqiad.wmnet
  • 15:04 arnaudb@cumin1001: dbctl commit (dc=all): 'decommission db1128 29 and 47', diff saved to https://phabricator.wikimedia.org/P54385 and previous config saved to /var/cache/conftool/dbconfig/20231213-150425-arnaudb.json
  • 15:00 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:00 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for CheckUser: Enable read new for event tables migration on group1 (T341829) (duration: 08m 29s)
  • 14:53 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and dreamyjazz: Continuing with sync
  • 14:53 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and dreamyjazz: Backport for CheckUser: Enable read new for event tables migration on group1 (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:51 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for CheckUser: Enable read new for event tables migration on group1 (T341829)
  • 14:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 30%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54384 and previous config saved to /var/cache/conftool/dbconfig/20231213-145121-arnaudb.json
  • 14:49 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Utilities/Yaml: Use string as value with ini_set (T348496) (duration: 19m 09s)
  • 14:43 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:43 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:42 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and abi: Continuing with sync
  • 14:42 hashar: Restarted Gerrit on gerrit1003 and gerrit2002
  • 14:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 20%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54383 and previous config saved to /var/cache/conftool/dbconfig/20231213-143616-arnaudb.json
  • 14:33 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and abi: Backport for Utilities/Yaml: Use string as value with ini_set (T348496) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:30 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Utilities/Yaml: Use string as value with ini_set (T348496)
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
  • 14:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 10%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54381 and previous config saved to /var/cache/conftool/dbconfig/20231213-142111-arnaudb.json
  • 14:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
  • 14:02 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1148.eqiad.wmnet onto db1248.eqiad.wmnet
  • 14:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1148 in db1248 for T344036', diff saved to https://phabricator.wikimedia.org/P54380 and previous config saved to /var/cache/conftool/dbconfig/20231213-140017-arnaudb.json
  • 13:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: provisionning db1248.eqiad.wmnet - T344036
  • 13:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: provisionning db1248.eqiad.wmnet - T344036
  • 13:57 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: provisionning db1248.eqiad.wmnet - T344036
  • 13:53 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
  • 13:53 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 13:51 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
  • 13:51 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
  • 13:50 moritzm: installing postgresql-11 security updates
  • 13:49 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
  • 13:48 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
  • 13:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1129 in db1233 for T344036', diff saved to https://phabricator.wikimedia.org/P54379 and previous config saved to /var/cache/conftool/dbconfig/20231213-134632-arnaudb.json
  • 13:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
  • 13:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
  • 13:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
  • 13:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
  • 13:27 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1132.eqiad.wmnet onto db1232.eqiad.wmnet
  • 13:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1132 in db1232 for T344036', diff saved to https://phabricator.wikimedia.org/P54376 and previous config saved to /var/cache/conftool/dbconfig/20231213-132511-arnaudb.json
  • 13:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
  • 13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
  • 13:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
  • 13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
  • 13:05 godog: delete raw replica blocks for prometheus/ops (only one replica) in eqiad - T351927
  • 12:55 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
  • 12:42 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:42 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:40 moritzm: installing OpenSSH security updates on bullseye
  • 12:25 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:25 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:16 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:16 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:11 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:11 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:10 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:09 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:08 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1233.eqiad.wmnet with OS bookworm
  • 12:02 vgutierrez: setting cp4037 as inactive - T352876
  • 11:49 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
  • 11:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
  • 11:37 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:36 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:33 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db1233.eqiad.wmnet with OS bookworm
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 11:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 11:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 11:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
  • 11:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
  • 10:50 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
  • 10:49 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
  • 10:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
  • 10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
  • 10:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
  • 10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
  • 10:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
  • 10:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1226.eqiad.wmnet with OS bookworm
  • 10:31 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 10:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:24 claime: Updating mw-debug prometheus-php-fpm-exporter to 0.0.3
  • 10:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
  • 10:11 hashar@deploy2002: Finished deploy [releng/jenkins-deploy@77b3681] (releasing): Rename jenkins-slave to jenkins-agent - T254646 (duration: 00m 42s)
  • 10:11 hashar@deploy2002: Started deploy [releng/jenkins-deploy@77b3681] (releasing): Rename jenkins-slave to jenkins-agent - T254646
  • 10:10 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
  • 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 100%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54374 and previous config saved to /var/cache/conftool/dbconfig/20231213-100708-arnaudb.json
  • 10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54373 and previous config saved to /var/cache/conftool/dbconfig/20231213-100651-arnaudb.json
  • 10:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54372 and previous config saved to /var/cache/conftool/dbconfig/20231213-100555-arnaudb.json
  • 10:00 moritzm: failover ganeti master in eqsin to ganeti5007
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
  • 09:57 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db1226.eqiad.wmnet with OS bookworm
  • 09:56 hashar: Disabled puppet agent on contint1002, contint2002, releases1003 and releases2003 to progressively deploy https://gerrit.wikimedia.org/r/922555
  • 09:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
  • 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 90%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54371 and previous config saved to /var/cache/conftool/dbconfig/20231213-095203-arnaudb.json
  • 09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54370 and previous config saved to /var/cache/conftool/dbconfig/20231213-095146-arnaudb.json
  • 09:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54369 and previous config saved to /var/cache/conftool/dbconfig/20231213-095049-arnaudb.json
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
  • 09:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
  • 09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
  • 09:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 80%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54368 and previous config saved to /var/cache/conftool/dbconfig/20231213-093658-arnaudb.json
  • 09:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54367 and previous config saved to /var/cache/conftool/dbconfig/20231213-093641-arnaudb.json
  • 09:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54366 and previous config saved to /var/cache/conftool/dbconfig/20231213-093544-arnaudb.json
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
  • 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 09:25 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
  • 09:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:25 brouberol: increasing pod max requested memory to a higher value than the container max requested memory for dse-k8s-eqiad - T351722
  • 09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 70%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54365 and previous config saved to /var/cache/conftool/dbconfig/20231213-092153-arnaudb.json
  • 09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54364 and previous config saved to /var/cache/conftool/dbconfig/20231213-092136-arnaudb.json
  • 09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54363 and previous config saved to /var/cache/conftool/dbconfig/20231213-092039-arnaudb.json
  • 09:20 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
  • 09:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 60%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54362 and previous config saved to /var/cache/conftool/dbconfig/20231213-090648-arnaudb.json
  • 09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54361 and previous config saved to /var/cache/conftool/dbconfig/20231213-090631-arnaudb.json
  • 09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54360 and previous config saved to /var/cache/conftool/dbconfig/20231213-090534-arnaudb.json
  • 08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 202120
  • 08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 202120
  • 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 50%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54359 and previous config saved to /var/cache/conftool/dbconfig/20231213-085143-arnaudb.json
  • 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54358 and previous config saved to /var/cache/conftool/dbconfig/20231213-085125-arnaudb.json
  • 08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54357 and previous config saved to /var/cache/conftool/dbconfig/20231213-085027-arnaudb.json
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
  • 08:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
  • 08:48 XioNoX: delete bgp group Confed_drmrs from cr1-esams - T347892
  • 08:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3856
  • 08:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
  • 08:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 46997
  • 08:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46997
  • 08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 40%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54356 and previous config saved to /var/cache/conftool/dbconfig/20231213-083638-arnaudb.json
  • 08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54355 and previous config saved to /var/cache/conftool/dbconfig/20231213-083620-arnaudb.json
  • 08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54354 and previous config saved to /var/cache/conftool/dbconfig/20231213-083522-arnaudb.json
  • 08:30 XioNoX: delete bgp group Confed_esams from cr2-drmrs - T347892
  • 08:25 mlitn@deploy2002: Finished scap: Backport for No custom UW licensing config (duration: 09m 43s)
  • 08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 30%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54353 and previous config saved to /var/cache/conftool/dbconfig/20231213-082133-arnaudb.json
  • 08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54352 and previous config saved to /var/cache/conftool/dbconfig/20231213-082115-arnaudb.json
  • 08:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54351 and previous config saved to /var/cache/conftool/dbconfig/20231213-082017-arnaudb.json
  • 08:18 mlitn@deploy2002: mlitn: Continuing with sync
  • 08:17 mlitn@deploy2002: mlitn: Backport for No custom UW licensing config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:16 mlitn@deploy2002: Started scap: Backport for No custom UW licensing config
  • 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1020.eqiad.wmnet with OS bookworm
  • 08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 20%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54350 and previous config saved to /var/cache/conftool/dbconfig/20231213-080628-arnaudb.json
  • 08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54349 and previous config saved to /var/cache/conftool/dbconfig/20231213-080610-arnaudb.json
  • 08:06 moritzm: installing openssh security updates
  • 08:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54348 and previous config saved to /var/cache/conftool/dbconfig/20231213-080512-arnaudb.json
  • 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1020.eqiad.wmnet with reason: host reimage
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1020.eqiad.wmnet with reason: host reimage
  • 07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 10%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54347 and previous config saved to /var/cache/conftool/dbconfig/20231213-075123-arnaudb.json
  • 07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54346 and previous config saved to /var/cache/conftool/dbconfig/20231213-075105-arnaudb.json
  • 07:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54345 and previous config saved to /var/cache/conftool/dbconfig/20231213-075006-arnaudb.json
  • 07:43 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
  • 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1020.eqiad.wmnet with OS bookworm
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1021.eqiad.wmnet with OS bookworm
  • 06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1021.eqiad.wmnet with reason: host reimage
  • 06:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1021.eqiad.wmnet with reason: host reimage
  • 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1021.eqiad.wmnet with OS bookworm
  • 03:41 hashar@deploy2002: Finished deploy [gerrit/gerrit@9bf8914]: Add a banner for the 2023 developer survey - T351109 (duration: 00m 08s)
  • 03:41 hashar@deploy2002: Started deploy [gerrit/gerrit@9bf8914]: Add a banner for the 2023 developer survey - T351109

2023-12-12

  • 23:56 ejegg: donorwiki upgraded from f7407053 to bc49e5a6
  • 23:26 tzatziki: removing 2 files for legal compliance
  • 23:05 tzatziki: removing 2 files for legal compliance
  • 22:57 mutante: planet - switched to eqiad and bookworm backend (T348392 T345617) - https://meta.wikimedia.org/wiki/Planet_Wikimedia
  • 22:43 mutante: planet2003 -manually upgrade rawdog package to 3.0.2 T348392
  • 21:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet1003.eqiad.wmnet with reason: debugging
  • 21:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on planet1003.eqiad.wmnet with reason: debugging
  • 21:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet1003.eqiad.wmnet with reason: debugging
  • 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet2003.codfw.wmnet with reason: debugging
  • 21:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on planet2003.codfw.wmnet with reason: debugging
  • 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet2003.codfw.wmnet with reason: reimage
  • 21:18 samtar@deploy2002: Finished scap: Backport for Add stream config for Android article instruments (T351292) (duration: 11m 59s)
  • 21:10 samtar@deploy2002: cjming and samtar: Continuing with sync
  • 21:07 samtar@deploy2002: cjming and samtar: Backport for Add stream config for Android article instruments (T351292) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:06 samtar@deploy2002: Started scap: Backport for Add stream config for Android article instruments (T351292)
  • 20:42 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 20:40 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 20:38 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 20:37 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 20:33 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:30 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 20:28 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:17 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:05 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 20:04 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 19:59 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
  • 19:57 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:56 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:46 ryankemper@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:46 ryankemper@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:43 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.9 refs T350085
  • 19:33 brennen@deploy2002: Finished scap: Backport for ParserOutput::getText(): do not clone ParserOutput when invoking pipeline (T353257) (duration: 09m 41s)
  • 19:26 brennen@deploy2002: brennen and ssastry: Continuing with sync
  • 19:25 brennen@deploy2002: brennen and ssastry: Backport for ParserOutput::getText(): do not clone ParserOutput when invoking pipeline (T353257) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:23 brennen@deploy2002: Started scap: Backport for ParserOutput::getText(): do not clone ParserOutput when invoking pipeline (T353257)
  • 19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
  • 19:08 brennen: 1.42.0-wmf.9 (T350085) status: deploying a fix for T353257 and then will proceed to group0.
  • 19:03 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
  • 19:03 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
  • 18:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host phab2002.codfw.wmnet with OS bullseye
  • 18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
  • 18:32 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:31 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 18:29 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 18:28 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
  • 18:12 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host phab2002.codfw.wmnet with OS bullseye
  • 18:10 mutante: reimaging phab2002 (stand-by phorge server with bullseye - T327068
  • 17:42 ejegg: fundraising civicrm upgraded from 8c107215 to 834606ef
  • 17:33 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:33 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt sessionstore - jclark@cumin1001"
  • 17:32 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt sessionstore - jclark@cumin1001"
  • 17:32 ejegg: payments-wiki upgraded from 1d24dc90 to c1181b95
  • 17:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bullseye
  • 17:30 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 17:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on phab2002.codfw.wmnet with reason: reimage
  • 17:16 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on phab2002.codfw.wmnet with reason: reimage
  • 17:16 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:16 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:13 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:13 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:13 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:34 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Waiting for hardware install
  • 16:33 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Waiting for hardware install
  • 16:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2001.codfw.wmnet
  • 16:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes1060']
  • 16:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1060']
  • 16:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2001.codfw.wmnet
  • 16:05 brennen@deploy2002: Finished deploy [phabricator/deployment@c243cc2]: deploy to phab1004 for T353274 (duration: 00m 48s)
  • 16:04 brennen@deploy2002: Started deploy [phabricator/deployment@c243cc2]: deploy to phab1004 for T353274
  • 16:04 brennen@deploy2002: Finished deploy [phabricator/deployment@c243cc2]: test deploy to phab2002 for T353274 (duration: 00m 32s)
  • 16:03 brennen@deploy2002: Started deploy [phabricator/deployment@c243cc2]: test deploy to phab2002 for T353274
  • 16:03 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
  • 16:03 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
  • 16:00 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 15:59 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 15:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
  • 15:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1137.eqiad.wmnet onto db1237.eqiad.wmnet
  • 15:30 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:30 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:28 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:28 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:27 cgoubert@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 15:27 cgoubert@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 15:27 cgoubert@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:26 cgoubert@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:25 cgoubert@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:25 cgoubert@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:25 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:24 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:23 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:22 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:22 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:21 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:21 claime: Deploying new calico BGPPeers for codfw rows a/b - T352893
  • 14:54 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1137.eqiad.wmnet onto db1237.eqiad.wmnet
  • 14:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1137 in db1237 for T344036', diff saved to https://phabricator.wikimedia.org/P54339 and previous config saved to /var/cache/conftool/dbconfig/20231212-145205-arnaudb.json
  • 14:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
  • 14:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
  • 14:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
  • 14:50 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 14:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
  • 14:48 phuedx: UTC afternoon backport window done
  • 14:47 phuedx@deploy2002: Finished scap: Backport for Partially undeploy Reader Demographics 2 survey (T344393) (duration: 24m 33s)
  • 14:39 phuedx@deploy2002: phuedx and dani: Continuing with sync
  • 14:35 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
  • 14:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on netbox-dev2002.codfw.wmnet with reason: Restoring DB from backup on netbox-dev2002
  • 14:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on netbox-dev2002.codfw.wmnet with reason: Restoring DB from backup on netbox-dev2002
  • 14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1211 in db1226 for T344036', diff saved to https://phabricator.wikimedia.org/P54336 and previous config saved to /var/cache/conftool/dbconfig/20231212-143233-arnaudb.json
  • 14:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
  • 14:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
  • 14:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
  • 14:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
  • 14:24 phuedx@deploy2002: phuedx and dani: Backport for Partially undeploy Reader Demographics 2 survey (T344393) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:22 phuedx@deploy2002: Started scap: Backport for Partially undeploy Reader Demographics 2 survey (T344393)
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:45 brouberol: increasing max container memory requests in dse-k8s from 3GB to 8GB - T351722
  • 13:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1128.eqiad.wmnet onto db1228.eqiad.wmnet
  • 13:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1129.eqiad.wmnet onto db1229.eqiad.wmnet
  • 13:16 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
  • 13:16 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1002.eqiad.wmnet
  • 13:09 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1002.eqiad.wmnet
  • 13:09 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
  • 13:06 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1147.eqiad.wmnet onto db1247.eqiad.wmnet
  • 13:00 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1001.eqiad.wmnet
  • 12:57 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
  • 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup1011.eqiad.wmnet
  • 12:55 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1001.eqiad.wmnet
  • 12:53 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 12:52 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 12:51 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
  • 12:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host backup1011.eqiad.wmnet
  • 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup1010.eqiad.wmnet
  • 12:45 jayme: increasing memory of ganeti instance kubemaster2001.codfw.wmnet from 4G to 12G (requires reboot) - T353233
  • 12:38 claime: Uncordoning kubernetes10[59-62].eqiad.wmnet - T353135
  • 12:37 claime: Pooling kubernetes10[59-62].eqiad.wmnet - T353135
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup2011.codfw.wmnet
  • 12:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host backup2011.codfw.wmnet
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup2010.codfw.wmnet
  • 12:03 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host backup2010.codfw.wmnet
  • 11:43 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 11:43 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 11:28 moritzm: installing postgresql-11 security updates
  • 10:50 samtar@deploy2002: Finished scap: Backport for testwiki: Enable the Edit Recovery feature (T353041) (duration: 09m 51s)
  • 10:47 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1229.eqiad.wmnet
  • 10:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1129 in db1229 for T344036', diff saved to https://phabricator.wikimedia.org/P54335 and previous config saved to /var/cache/conftool/dbconfig/20231212-104404-arnaudb.json
  • 10:43 samtar@deploy2002: samtar and samwilson: Continuing with sync
  • 10:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
  • 10:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
  • 10:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
  • 10:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
  • 10:41 samtar@deploy2002: samtar and samwilson: Backport for testwiki: Enable the Edit Recovery feature (T353041) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:40 samtar@deploy2002: Started scap: Backport for testwiki: Enable the Edit Recovery feature (T353041)
  • 10:30 moritzm: installing nghttp2 security updates
  • 10:16 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:15 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:13 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:13 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:09 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:09 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:05 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:04 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:04 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 10:04 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 09:57 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1128.eqiad.wmnet onto db1228.eqiad.wmnet
  • 09:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 clone from db1128 ', diff saved to https://phabricator.wikimedia.org/P54334 and previous config saved to /var/cache/conftool/dbconfig/20231212-095352-arnaudb.json
  • 09:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
  • 09:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
  • 09:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
  • 09:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
  • 09:43 moritzm: installing ca-certificates-java updates from Bookworm point release
  • 09:08 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1147.eqiad.wmnet onto db1247.eqiad.wmnet
  • 09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1147 in db1247 for T344036', diff saved to https://phabricator.wikimedia.org/P54333 and previous config saved to /var/cache/conftool/dbconfig/20231212-090652-arnaudb.json
  • 09:05 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
  • 09:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
  • 09:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
  • 09:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
  • 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: server BGP in netbox plugin - ayounsi@cumin1001
  • 08:48 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: server BGP in netbox plugin - ayounsi@cumin1001
  • 08:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2135,2160].codfw.wmnet,db[1176,1217].eqiad.wmnet with reason: m5 ipoid maintenance
  • 08:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2135,2160].codfw.wmnet,db[1176,1217].eqiad.wmnet with reason: m5 ipoid maintenance
  • 07:52 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 07:52 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 07:50 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 07:49 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 07:17 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4800
  • 07:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4800
  • 06:46 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 as master of pc1" (duration: 09m 00s)
  • 06:38 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:38 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 as master of pc1" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:37 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 as master of pc1"
  • 06:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2011.codfw.wmnet with OS bookworm
  • 06:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2011.codfw.wmnet with reason: host reimage
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2011.codfw.wmnet with reason: host reimage
  • 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2011.codfw.wmnet with OS bookworm
  • 05:59 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc2014 as master of pc1 (T351787) (duration: 08m 35s)
  • 05:52 marostegui@deploy2002: marostegui: Continuing with sync
  • 05:52 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc2014 as master of pc1 (T351787) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 05:51 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc2014 as master of pc1 (T351787)
  • 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
  • 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
  • 04:58 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.5 (duration: 02m 17s)
  • 04:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.9 refs T350085 (duration: 53m 03s)
  • 04:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.9 refs T350085

2023-12-11

  • 22:39 jdrewniak@deploy2002: Finished scap: Backport for [Vector] Deploy the Zebra CSS refactor under feature flag (T353008) (duration: 12m 14s)
  • 22:32 jdrewniak@deploy2002: jdrewniak: Continuing with sync
  • 22:28 jdrewniak@deploy2002: jdrewniak: Backport for [Vector] Deploy the Zebra CSS refactor under feature flag (T353008) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:26 jdrewniak@deploy2002: Started scap: Backport for [Vector] Deploy the Zebra CSS refactor under feature flag (T353008)
  • 22:23 ladsgroup@deploy2002: Finished scap: Backport for api: Add support for pagelinks migration in ApiQueryBacklinks::runSecondQuery (T351237) (duration: 10m 42s)
  • 22:15 ladsgroup@deploy2002: jforrester and ladsgroup: Continuing with sync
  • 22:14 ladsgroup@deploy2002: jforrester and ladsgroup: Backport for api: Add support for pagelinks migration in ApiQueryBacklinks::runSecondQuery (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:12 ladsgroup@deploy2002: Started scap: Backport for api: Add support for pagelinks migration in ApiQueryBacklinks::runSecondQuery (T351237)
  • 22:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
  • 22:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
  • 18:34 claime: Raised replicas for mw-web
  • 18:32 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 18:32 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 18:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 18:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 18:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 18:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:48 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:47 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:47 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:45 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:45 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 17:43 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 17:43 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:42 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 17:04 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 15s)
  • 17:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2004.codfw.wmnet with OS bullseye
  • 17:00 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:57 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:56 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 10m 12s)
  • 16:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1060.eqiad.wmnet with OS bullseye
  • 16:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1061.eqiad.wmnet with OS bullseye
  • 16:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1062.eqiad.wmnet with OS bullseye
  • 16:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1059.eqiad.wmnet with OS bullseye
  • 16:42 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
  • 16:39 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
  • 16:27 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
  • 16:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2002.codfw.wmnet with OS bullseye
  • 16:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
  • 16:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
  • 16:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
  • 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2005.codfw.wmnet with OS bullseye
  • 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:20 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
  • 16:19 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: Enable canary events for all MediaWiki event streams (T266798) (duration: 08m 25s)
  • 16:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
  • 16:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
  • 16:17 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
  • 16:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
  • 16:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:13 vgutierrez: rolling restart of pybal on lvs1020 and lvs1017 effectively enabling IPIP encapsulation on ncredir@eqiad - T351069
  • 16:10 ottomata: enabling canary events for all mediawiki state change event streams - T266798
  • 16:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1062.eqiad.wmnet with OS bullseye
  • 16:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1061.eqiad.wmnet with OS bullseye
  • 16:02 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1060.eqiad.wmnet with OS bullseye
  • 16:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
  • 16:01 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1059.eqiad.wmnet with OS bullseye
  • 16:01 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:00 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:59 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:58 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:57 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:57 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:56 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
  • 15:55 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:55 claime: homer lsw1-*eqiad* commit "Put kubernetes10[59-62] in production - T353135"
  • 15:55 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
  • 15:55 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:55 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:54 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:53 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:53 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:53 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
  • 15:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:48 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
  • 15:39 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2006.codfw.wmnet with OS bullseye
  • 15:39 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1143.eqiad.wmnet
  • 15:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1143.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
  • 15:32 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1143.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:30 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:25 brouberol: provisioning TLS certificates for the spark-history and spark-history-test namespaces in dse-k8s-eqiad - T352639
  • 15:25 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1143.eqiad.wmnet
  • 15:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1142.eqiad.wmnet
  • 15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1142.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:20 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1142.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:18 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:12 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1142.eqiad.wmnet
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
  • 15:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
  • 15:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1141.eqiad.wmnet
  • 15:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1141.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:01 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1141.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 14:57 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 14:57 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 14:57 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 14:56 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 14:56 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 14:53 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 14:53 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 14:53 arnaudb@cumin1001: dbctl commit (dc=all): 'decommission db1141 42 and 43', diff saved to https://phabricator.wikimedia.org/P54330 and previous config saved to /var/cache/conftool/dbconfig/20231211-145300-arnaudb.json
  • 14:52 milimetric@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 14:52 milimetric@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 14:51 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1141.eqiad.wmnet
  • 14:51 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 14:51 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 14:50 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 14:50 otto@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 14:49 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 14:49 milimetric@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 14:48 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 14:48 milimetric@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 14:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
  • 14:46 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 14:45 otto@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 14:45 ottomata: deploying changeprop to pick up https://phabricator.wikimedia.org/T351247
  • 14:37 TheresNoTime: close UTC afternoon backport window
  • 14:25 samtar@deploy2002: Finished scap: Backport for hewikivoyage: update vector 2022 wordmark and tagline (T351981) (duration: 10m 35s)
  • 14:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1138.eqiad.wmnet - arnaudb@cumin1001"
  • 14:17 samtar@deploy2002: samtar and anzx: Continuing with sync
  • 14:16 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1138.eqiad.wmnet - arnaudb@cumin1001"
  • 14:15 samtar@deploy2002: samtar and anzx: Backport for hewikivoyage: update vector 2022 wordmark and tagline (T351981) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:14 samtar@deploy2002: Started scap: Backport for hewikivoyage: update vector 2022 wordmark and tagline (T351981)
  • 14:11 samtar@deploy2002: Finished scap: Backport for Enable read new on group0 wikis (T341829) (duration: 07m 57s)
  • 14:05 samtar@deploy2002: samtar and dreamyjazz: Continuing with sync
  • 14:05 samtar@deploy2002: samtar and dreamyjazz: Backport for Enable read new on group0 wikis (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:03 samtar@deploy2002: Started scap: Backport for Enable read new on group0 wikis (T341829)
  • 13:59 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 13:58 arnaudb@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:56 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 13:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 13:27 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db1138.eqiad.wmnet
  • 13:26 arnaudb@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:25 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'decommission db1138', diff saved to https://phabricator.wikimedia.org/P54328 and previous config saved to /var/cache/conftool/dbconfig/20231211-132250-arnaudb.json
  • 13:20 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1138.eqiad.wmnet
  • 13:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: decomission pre downtime
  • 13:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: decomission pre downtime
  • 13:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 13:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 13:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 13:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 12:57 claime: Rebuilding production-images for python3-build-bookworm - T352733
  • 12:12 urbanecm@deploy2002: Finished scap: Backport for Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266) (duration: 08m 20s)
  • 12:11 brouberol: Adding spark-history(-test).svc.eqiad.wmnet CNAMEs pointing to k8s-ingress-dse.svc.eqiad.wmnet. - T352639
  • 12:05 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 12:05 urbanecm@deploy2002: urbanecm: Backport for Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:03 urbanecm@deploy2002: Started scap: Backport for Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266)
  • 11:20 vgutierrez: rolling restart of pybal on lvs3010 and lvs3008 effectively enabling IPIP encapsulation on ncredir@esams - T351069
  • 11:18 claime: sudo confctl --object-type discovery select 'name=eqiad,dnsdisc=k8s-ingress-dse' set/pooled=true - T352639
  • 11:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 11:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 11:12 brouberol: Add discovery records for the k8s-ingress-dse LVS service - T352639
  • 10:55 dcausse: (properly) restarting blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 10:54 cgoubert@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[1019-1020].eqiad.wmnet} and A:lvs (T352639)
  • 10:50 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[1019-1020].eqiad.wmnet} and A:lvs (T352639)
  • 10:46 claime: Running puppet on O:lvs::balancer - T352639
  • 10:45 claime: Disabling puppet on O:lvs::balancer - T352639
  • 10:42 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
  • 10:42 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
  • 10:42 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 10:38 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 10:38 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
  • 10:38 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
  • 10:37 claime: Repooling dse-k8s-worker nodes - sudo confctl select "service=kubesvc,cluster=dse-k8s" set/pooled=yes - T352639
  • 10:03 jayme: removed cergen certs of all k8s servies from private puppet in commit d36a97a - T300033
  • 09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38753
  • 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38753
  • 09:55 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
  • 09:55 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
  • 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1547
  • 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1547
  • 09:50 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 09:50 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 09:44 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
  • 09:44 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
  • 08:43 kostajh: UTC morning deploys done
  • 08:43 kharlan@deploy2002: Finished scap: Backport for ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366), IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604) (duration: 22m 02s)
  • 08:40 dcausse: restarted blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 08:36 kharlan@deploy2002: kharlan and d3r1ck01: Continuing with sync
  • 08:22 kharlan@deploy2002: kharlan and d3r1ck01: Backport for ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366), IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:21 kharlan@deploy2002: Started scap: Backport for ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366), IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604)
  • 08:16 kharlan@deploy2002: Finished scap: Backport for MediaModeration: Set MediaModerationDeveloperMode to false (duration: 09m 55s)
  • 08:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: reboot for upgrade
  • 08:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: reboot for upgrade
  • 08:09 kharlan@deploy2002: kharlan: Continuing with sync
  • 08:07 kharlan@deploy2002: kharlan: Backport for MediaModeration: Set MediaModerationDeveloperMode to false synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:06 kharlan@deploy2002: Started scap: Backport for MediaModeration: Set MediaModerationDeveloperMode to false
  • 07:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: reboot for upgrade
  • 07:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: reboot for upgrade
  • 07:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
  • 07:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
  • 07:24 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
  • 07:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
  • 07:12 marostegui: Failvoer m3-master from dbproxy1020 to dbproxy1026 T351864
  • 07:12 marostegui: Failvoer m3-master from dbproxy1020 to dbproxy1026 org
  • 06:44 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1" (duration: 08m 22s)
  • 06:37 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:37 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:35 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1"
  • 06:35 _joe_: update sirenbot to 0.3.7
  • 06:34 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1011.eqiad.wmnet with OS bookworm
  • 06:29 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 06:26 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:19 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:16 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:13 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: host reimage
  • 06:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: host reimage
  • 06:07 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:07 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 05:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1011.eqiad.wmnet with OS bookworm
  • 05:54 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc1 (T351787) (duration: 16m 54s)
  • 05:47 marostegui@deploy2002: marostegui: Continuing with sync
  • 05:46 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 (T351787) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 05:37 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc1 (T351787)
  • 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
  • 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787

2023-12-09

  • 15:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
  • 15:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
  • 15:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2005.codfw.wmnet with OS bullseye
  • 01:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
  • 00:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
  • 00:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
  • 00:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
  • 00:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
  • 00:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
  • 00:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
  • 00:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
  • 00:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
  • 00:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2005.codfw.wmnet with OS bullseye
  • 00:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
  • 00:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
  • 00:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
  • 00:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
  • 00:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
  • 00:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
  • 00:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
  • 00:01 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:01 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply

2023-12-08

  • 23:49 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 23:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 23:48 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 23:48 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 23:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 23:47 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2003.codfw.wmnet with OS bullseye
  • 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
  • 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
  • 23:04 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:03 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:02 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
  • 22:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2003.codfw.wmnet with OS bullseye
  • 22:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
  • 22:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
  • 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
  • 22:26 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:26 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
  • 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2001.codfw.wmnet with OS bullseye
  • 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
  • 21:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
  • 21:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
  • 20:02 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:02 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:27 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:27 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:26 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:26 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:09 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:08 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:49 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:49 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
  • 16:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
  • 16:08 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:50 milimetric@deploy2002: Finished deploy [airflow-dags/platform_eng@049cf03]: (no justification provided) (duration: 00m 52s)
  • 15:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
  • 15:49 milimetric@deploy2002: Started deploy [airflow-dags/platform_eng@049cf03]: (no justification provided)
  • 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
  • 15:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
  • 15:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
  • 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
  • 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
  • 15:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
  • 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
  • 15:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
  • 14:44 XioNoX: drain eqiad-codfw lumen transport for maintenance - T342502
  • 14:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/echoserver: apply
  • 14:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/echoserver: apply
  • 14:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:55 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 12:55 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 12:42 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 12:42 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:40 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:40 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54322 and previous config saved to /var/cache/conftool/dbconfig/20231208-101337-arnaudb.json
  • 09:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54321 and previous config saved to /var/cache/conftool/dbconfig/20231208-095830-arnaudb.json
  • 09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54320 and previous config saved to /var/cache/conftool/dbconfig/20231208-094324-arnaudb.json
  • 09:41 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:41 brouberol: Creating the echoserver namespace in dse-k8s-eqiad - T353004
  • 09:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54319 and previous config saved to /var/cache/conftool/dbconfig/20231208-092817-arnaudb.json
  • 09:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54318 and previous config saved to /var/cache/conftool/dbconfig/20231208-091628-arnaudb.json
  • 09:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 09:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 07:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 237
  • 07:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 237
  • 06:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54317 and previous config saved to /var/cache/conftool/dbconfig/20231208-062636-ladsgroup.json
  • 06:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P54316 and previous config saved to /var/cache/conftool/dbconfig/20231208-061130-ladsgroup.json
  • 05:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P54315 and previous config saved to /var/cache/conftool/dbconfig/20231208-055623-ladsgroup.json
  • 05:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54314 and previous config saved to /var/cache/conftool/dbconfig/20231208-054116-ladsgroup.json
  • 05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54313 and previous config saved to /var/cache/conftool/dbconfig/20231208-050624-ladsgroup.json
  • 05:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 05:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 04:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 04:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54312 and previous config saved to /var/cache/conftool/dbconfig/20231208-041826-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P54311 and previous config saved to /var/cache/conftool/dbconfig/20231208-040319-ladsgroup.json
  • 03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P54310 and previous config saved to /var/cache/conftool/dbconfig/20231208-034813-ladsgroup.json
  • 03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54309 and previous config saved to /var/cache/conftool/dbconfig/20231208-033306-ladsgroup.json
  • 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54308 and previous config saved to /var/cache/conftool/dbconfig/20231208-030005-ladsgroup.json
  • 03:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 02:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54307 and previous config saved to /var/cache/conftool/dbconfig/20231208-025942-ladsgroup.json
  • 02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P54306 and previous config saved to /var/cache/conftool/dbconfig/20231208-024435-ladsgroup.json
  • 02:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P54305 and previous config saved to /var/cache/conftool/dbconfig/20231208-022929-ladsgroup.json
  • 02:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
  • 02:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
  • 02:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
  • 02:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
  • 02:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2004']
  • 02:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2004']
  • 02:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54304 and previous config saved to /var/cache/conftool/dbconfig/20231208-021422-ladsgroup.json
  • 02:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
  • 01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54303 and previous config saved to /var/cache/conftool/dbconfig/20231208-012115-ladsgroup.json
  • 01:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 01:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 01:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54302 and previous config saved to /var/cache/conftool/dbconfig/20231208-012051-ladsgroup.json
  • 01:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P54301 and previous config saved to /var/cache/conftool/dbconfig/20231208-010545-ladsgroup.json
  • 00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P54300 and previous config saved to /var/cache/conftool/dbconfig/20231208-005038-ladsgroup.json
  • 00:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1036.eqiad.wmnet with OS bullseye
  • 00:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1037.eqiad.wmnet with OS bullseye
  • 00:43 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1035.eqiad.wmnet with OS bullseye
  • 00:38 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:37 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:36 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1038.eqiad.wmnet with OS bullseye
  • 00:36 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54299 and previous config saved to /var/cache/conftool/dbconfig/20231208-003532-ladsgroup.json
  • 00:35 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1036.eqiad.wmnet with reason: host reimage
  • 00:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage
  • 00:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage
  • 00:19 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1038.eqiad.wmnet with reason: host reimage
  • 00:16 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage
  • 00:16 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1036.eqiad.wmnet with reason: host reimage
  • 00:15 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage
  • 00:15 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1038.eqiad.wmnet with reason: host reimage
  • 00:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1038.eqiad.wmnet with OS bullseye
  • 00:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1037.eqiad.wmnet with OS bullseye
  • 00:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1036.eqiad.wmnet with OS bullseye
  • 00:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1035.eqiad.wmnet with OS bullseye

2023-12-07

  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54298 and previous config saved to /var/cache/conftool/dbconfig/20231207-235333-ladsgroup.json
  • 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P54297 and previous config saved to /var/cache/conftool/dbconfig/20231207-235310-ladsgroup.json
  • 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1061.eqiad.wmnet with OS bullseye
  • 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1062.eqiad.wmnet with OS bullseye
  • 23:52 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1059.eqiad.wmnet with OS bullseye
  • 23:52 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1060.eqiad.wmnet with OS bullseye
  • 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 23:52 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 23:47 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:47 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P54296 and previous config saved to /var/cache/conftool/dbconfig/20231207-233802-ladsgroup.json
  • 23:23 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:23 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:23 ryankemper@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P54295 and previous config saved to /var/cache/conftool/dbconfig/20231207-232256-ladsgroup.json
  • 23:21 ryankemper@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 23:21 ryankemper@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 23:21 ryankemper@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 23:17 ryankemper@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:15 ryankemper@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P54294 and previous config saved to /var/cache/conftool/dbconfig/20231207-230749-ladsgroup.json
  • 23:05 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 22:58 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 22:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp4037.ulsfo.wmnet
  • 22:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 22:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 22:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
  • 22:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
  • 22:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
  • 22:33 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
  • 22:31 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
  • 22:30 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
  • 22:30 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
  • 22:29 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
  • 22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P54293 and previous config saved to /var/cache/conftool/dbconfig/20231207-222656-ladsgroup.json
  • 22:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 22:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P54292 and previous config saved to /var/cache/conftool/dbconfig/20231207-222633-ladsgroup.json
  • 22:22 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:22 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:20 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:20 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:19 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:19 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1062.eqiad.wmnet with OS bullseye
  • 22:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1061.eqiad.wmnet with OS bullseye
  • 22:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1060.eqiad.wmnet with OS bullseye
  • 22:15 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1059.eqiad.wmnet with OS bullseye
  • 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1061.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1060.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1062.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1059.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P54291 and previous config saved to /var/cache/conftool/dbconfig/20231207-221127-ladsgroup.json
  • 22:10 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:10 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1062.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1061.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1060.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1059.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P54290 and previous config saved to /var/cache/conftool/dbconfig/20231207-215620-ladsgroup.json
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P54289 and previous config saved to /var/cache/conftool/dbconfig/20231207-214114-ladsgroup.json
  • 21:38 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@049cf03]: (no justification provided) (duration: 00m 28s)
  • 21:37 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@049cf03]: (no justification provided)
  • 21:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1082.eqiad.wmnet with OS bullseye
  • 21:31 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:23 jdrewniak@deploy2002: Finished scap: Backport for Enable Vector beta feature for all wikis (T351339), [beta] ores-extension: enable revertrisk model for enwiki (T348298), Enable action blocks in Serbian Wikipedia (T351873) (duration: 09m 54s)
  • 21:17 jdrewniak@deploy2002: zoranzoki21 and isaranto and jdlrobson and jdrewniak: Continuing with sync
  • 21:15 jdrewniak@deploy2002: zoranzoki21 and isaranto and jdlrobson and jdrewniak: Backport for Enable Vector beta feature for all wikis (T351339), [beta] ores-extension: enable revertrisk model for enwiki (T348298), Enable action blocks in Serbian Wikipedia (T351873) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:13 jdrewniak@deploy2002: Started scap: Backport for Enable Vector beta feature for all wikis (T351339), [beta] ores-extension: enable revertrisk model for enwiki (T348298), Enable action blocks in Serbian Wikipedia (T351873)
  • 21:06 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: Remove eventlogging_FeaturePolicyViolation and _SpecialMuteSubmit EventStreamConfig (T329718) (duration: 09m 16s)
  • 21:02 dcausse: restarting blazegraph on wdqs2017 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P54288 and previous config saved to /var/cache/conftool/dbconfig/20231207-205817-ladsgroup.json
  • 20:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 20:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P54287 and previous config saved to /var/cache/conftool/dbconfig/20231207-205753-ladsgroup.json
  • 20:56 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: Config: Remove eventlogging_FeaturePolicyViolation and _SpecialMuteSubmit EventLogging config (T329718) (duration: 07m 07s)
  • 20:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P54286 and previous config saved to /var/cache/conftool/dbconfig/20231207-204247-ladsgroup.json
  • 20:30 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P54285 and previous config saved to /var/cache/conftool/dbconfig/20231207-202740-ladsgroup.json
  • 20:27 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P54283 and previous config saved to /var/cache/conftool/dbconfig/20231207-201234-ladsgroup.json
  • 20:06 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1082.eqiad.wmnet with reason: host reimage
  • 20:05 urandom: bootstrap Cassandra/restbase2030-a — T352468
  • 20:02 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1082.eqiad.wmnet with reason: host reimage
  • 20:01 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:01 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:59 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:59 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
  • 19:38 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:38 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:35 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: graph split experiments T350106
  • 19:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: graph split experiments T350106
  • 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P54282 and previous config saved to /var/cache/conftool/dbconfig/20231207-192949-ladsgroup.json
  • 19:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P54281 and previous config saved to /var/cache/conftool/dbconfig/20231207-192926-ladsgroup.json
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P54280 and previous config saved to /var/cache/conftool/dbconfig/20231207-191420-ladsgroup.json
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P54279 and previous config saved to /var/cache/conftool/dbconfig/20231207-185913-ladsgroup.json
  • 18:45 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:45 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P54278 and previous config saved to /var/cache/conftool/dbconfig/20231207-184406-ladsgroup.json
  • 18:42 mutante: puppetmaster1001 - revoke cert for miscweb.discovery.wmnet
  • 18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P54277 and previous config saved to /var/cache/conftool/dbconfig/20231207-180427-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 18:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 17:58 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs1024.eqiad.wmnet
  • 17:57 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs1024.eqiad.wmnet
  • 17:40 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 17:40 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 17:39 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 17:38 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 17:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
  • 17:09 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:09 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup logstash/kibana records T299700 - herron@cumin1001"
  • 17:08 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup logstash/kibana records T299700 - herron@cumin1001"
  • 17:05 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 16:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 16:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 16:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 16:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 16:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
  • 16:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
  • 16:38 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 16:27 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:27 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:26 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:26 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:24 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:24 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:23 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:09 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:02 sukhe: run dummy authdns-update on dns6001
  • 16:00 milimetric@deploy2002: Finished deploy [analytics/refinery@8b8f178] (thin): hotfix: sqoop (duration: 00m 07s)
  • 16:00 milimetric@deploy2002: Started deploy [analytics/refinery@8b8f178] (thin): hotfix: sqoop
  • 15:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54274 and previous config saved to /var/cache/conftool/dbconfig/20231207-155712-arnaudb.json
  • 15:55 milimetric@deploy2002: Finished deploy [analytics/refinery@8b8f178]: hotfix: sqoop (duration: 10m 08s)
  • 15:53 sukhe: running authdns-update with broken resolv.conf on dns6001
  • 15:48 sukhe: clear out dns6001 resolv.conf to check for SSH config-based authdns-update
  • 15:45 milimetric@deploy2002: Started deploy [analytics/refinery@8b8f178]: hotfix: sqoop
  • 15:45 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 15:44 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 15:44 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 15:44 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 15:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54273 and previous config saved to /var/cache/conftool/dbconfig/20231207-154205-arnaudb.json
  • 15:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
  • 15:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2005.codfw.wmnet with OS bullseye
  • 15:29 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 15:28 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:28 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 15:27 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:27 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 15:27 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54272 and previous config saved to /var/cache/conftool/dbconfig/20231207-152659-arnaudb.json
  • 15:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cp4037.ulsfo.wmnet
  • 15:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54271 and previous config saved to /var/cache/conftool/dbconfig/20231207-151152-arnaudb.json
  • 15:08 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 15:08 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 15:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54270 and previous config saved to /var/cache/conftool/dbconfig/20231207-150750-arnaudb.json
  • 15:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 15:07 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 15:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 15:07 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 15:06 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 15:06 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 15:04 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 15:03 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 15:02 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 15:01 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 15:01 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 15:00 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 14:53 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:53 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:53 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:53 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
  • 14:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
  • 14:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
  • 14:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
  • 14:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
  • 14:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
  • 14:41 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
  • 14:32 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:31 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:30 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:29 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:26 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:26 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:26 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 13:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@deploy2002: Finished scap: Backport for api: Only force backlink namespace index when there is one ns only (T351237) (duration: 10m 55s)
  • 13:42 ladsgroup@deploy2002: jforrester and ladsgroup: Continuing with sync
  • 13:40 ladsgroup@deploy2002: jforrester and ladsgroup: Backport for api: Only force backlink namespace index when there is one ns only (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:38 ladsgroup@deploy2002: Started scap: Backport for api: Only force backlink namespace index when there is one ns only (T351237)
  • 13:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:34 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:33 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:32 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:32 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:31 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:31 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:27 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:27 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 13:25 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
  • 13:25 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:25 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: sync
  • 13:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:19 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:18 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:10 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 13:09 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 13:09 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:09 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 13:09 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 13:08 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 13:07 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:07 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:52 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:48 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:48 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:47 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:47 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1001.eqiad.wmnet
  • 12:18 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:18 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:17 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:17 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:17 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:16 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:13 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1001.eqiad.wmnet
  • 11:51 btullis@deploy2002: Finished deploy [analytics/refinery@b6499b1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b6499b17] (duration: 03m 17s)
  • 11:48 btullis@deploy2002: Started deploy [analytics/refinery@b6499b1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b6499b17]
  • 11:33 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:33 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 11:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 11:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:17 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 11:17 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 11:14 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 11:14 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 11:13 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 11:13 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 11:12 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 11:10 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 11:10 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 11:01 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 10:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: cluster::management
  • 10:53 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 10:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 10:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 10:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:45 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: cluster::management
  • 10:38 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:38 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 10:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 10:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 10:34 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:34 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:33 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:33 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 10:32 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 10:27 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 10:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 10:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 10:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 10:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:42 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:41 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:40 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:40 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 31 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP nftables
  • 08:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 31 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP nftables
  • 08:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
  • 08:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
  • 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1119.eqiad.wmnet
  • 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1119.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:52 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1119.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:50 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:44 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1119.eqiad.wmnet
  • 06:35 marostegui: Failover m5-master from dbproxy1021 to dbproxy1027 T351864
  • 00:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
  • 00:53 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1081.eqiad.wmnet with OS bullseye
  • 00:53 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1080.eqiad.wmnet with OS bullseye
  • 00:53 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:53 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"

2023-12-06

  • 23:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1082.eqiad.wmnet with OS bullseye
  • 23:47 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 23:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 23:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1081.eqiad.wmnet with reason: host reimage
  • 23:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1080.eqiad.wmnet with reason: host reimage
  • 23:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1081.eqiad.wmnet with reason: host reimage
  • 23:19 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1080.eqiad.wmnet with reason: host reimage
  • 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 34 days, 0:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
  • 23:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 34 days, 0:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
  • 22:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1081.eqiad.wmnet with OS bullseye
  • 22:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
  • 22:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1080.eqiad.wmnet with OS bullseye
  • 22:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1080']
  • 22:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1082']
  • 22:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1081']
  • 22:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1080']
  • 22:43 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1081']
  • 22:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1081']
  • 22:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1080']
  • 22:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1082']
  • 22:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1081']
  • 22:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1080']
  • 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1080.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1081.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1080.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1081.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:51 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
  • 21:50 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
  • 21:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 21:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:43 urbanecm@deploy2002: Finished scap: Backport for Correct links to beta feature (T352826), Beta Features: Allow Vector 2022 typography feature (T351339) (duration: 10m 51s)
  • 21:36 urbanecm@deploy2002: urbanecm and jdlrobson: Continuing with sync
  • 21:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:35 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
  • 21:34 urbanecm@deploy2002: urbanecm and jdlrobson: Backport for Correct links to beta feature (T352826), Beta Features: Allow Vector 2022 typography feature (T351339) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:34 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
  • 21:33 urbanecm@deploy2002: Started scap: Backport for Correct links to beta feature (T352826), Beta Features: Allow Vector 2022 typography feature (T351339)
  • 21:32 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 21:31 urbanecm@deploy2002: Finished scap: Backport for DiscussionTools: Rename config (duration: 10m 01s)
  • 21:25 urbanecm@deploy2002: esanders and urbanecm: Continuing with sync
  • 21:22 urbanecm@deploy2002: esanders and urbanecm: Backport for DiscussionTools: Rename config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:21 urbanecm@deploy2002: Started scap: Backport for DiscussionTools: Rename config
  • 21:20 urbanecm@deploy2002: Finished scap: Backport for Enable DT visual enhancements on pages with (T352232) (duration: 10m 43s)
  • 21:13 urbanecm@deploy2002: urbanecm and esanders: Continuing with sync
  • 21:11 urbanecm@deploy2002: urbanecm and esanders: Backport for Enable DT visual enhancements on pages with (T352232) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:09 urbanecm@deploy2002: Started scap: Backport for Enable DT visual enhancements on pages with (T352232)
  • 20:55 ejegg: fundraising civicrm upgraded from 6ca683b2 to 8c107215
  • 19:07 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host wdqs1024.eqiad.wmnet
  • 18:55 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1024.eqiad.wmnet
  • 18:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
  • 18:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
  • 18:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2001.codfw.wmnet with OS bullseye
  • 18:02 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
  • 17:47 ejegg: standalone SmashPig upgraded from 83d509ed to fc74ccca
  • 17:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 17:34 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 17:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp4037.ulsfo.wmnet
  • 17:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4037.ulsfo.wmnet with OS bullseye
  • 17:06 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
  • 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cp4037.ulsfo.wmnet
  • 17:05 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
  • 16:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
  • 16:51 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 16:41 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 16:40 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 16:40 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 16:40 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 16:29 urandom: bootstrapping Cassandra/restbase2020-a — T352468
  • 16:07 milimetric@deploy2002: Finished deploy [airflow-dags/platform_eng@db1cb48]: in order to run the querypage job (duration: 01m 28s)
  • 16:05 milimetric@deploy2002: Started deploy [airflow-dags/platform_eng@db1cb48]: in order to run the querypage job
  • 15:56 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
  • 15:56 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
  • 15:52 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:51 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 15:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:47 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:46 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:46 urandom: restarting Cassandra on aqs2001-{a,b,c} (testing puppet 7 migration)
  • 15:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: sessionstore
  • 15:39 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:39 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 15:38 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 15:38 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:37 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: sessionstore
  • 15:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2001.codfw.wmnet with OS bullseye
  • 15:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
  • 15:30 jforrester@deploy2002: Finished scap: Backport for Beta Features: Move ULS Compact Links to only the wikis it's enabled on, Beta Features: Drop Popups, deployed everywhere for ages (duration: 11m 33s)
  • 15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 15:28 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:28 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 15:28 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: restbase::production
  • 15:23 sukhe: depool cp4037 for reimage testing: T350179
  • 15:23 jforrester@deploy2002: jforrester: Continuing with sync
  • 15:21 jforrester@deploy2002: jforrester: Backport for Beta Features: Move ULS Compact Links to only the wikis it's enabled on, Beta Features: Drop Popups, deployed everywhere for ages synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['testhost2001']
  • 15:19 jforrester@deploy2002: Started scap: Backport for Beta Features: Move ULS Compact Links to only the wikis it's enabled on, Beta Features: Drop Popups, deployed everywhere for ages
  • 15:14 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001']
  • 15:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: restbase::production
  • 15:02 moritzm: installing mariadb bugfix updates from Bookworm point release (as packaged in Debian, unrelated to wmf-mariadb packages)
  • 14:43 moritzm: installing debian-archive-keyring updates from Bookworm point release
  • 14:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dnsbox
  • 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dnsbox
  • 14:21 fabfur: repooling cp4052 after reimage (bookworm -> bullseye) due to possible impacting T352744
  • 13:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4052.ulsfo.wmnet
  • 13:45 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:45 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1001.eqiad.wmnet with OS bookworm
  • 13:37 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
  • 13:20 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4052.ulsfo.wmnet
  • 13:12 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@bfd944e]: Add metrics configuration TEST [airflow-dags@bfd944e4] (duration: 00m 11s)
  • 13:12 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@bfd944e]: Add metrics configuration TEST [airflow-dags@bfd944e4]
  • 13:08 moritzm: installing systemd bugfix updates from Bookworm point release
  • 12:52 mvernon@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
  • 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4041.ulsfo.wmnet
  • 12:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4041.ulsfo.wmnet
  • 12:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
  • 12:33 moritzm: installing pam bugfix updates from Bookworm point release
  • 12:30 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
  • 12:15 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1001.eqiad.wmnet with OS bookworm
  • 11:48 hnowlan: rollback changeprop-jobqueue
  • 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: druid::analytics::worker
  • 11:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:42 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:41 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:40 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: druid::analytics::worker
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4044.ulsfo.wmnet
  • 11:16 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4044.ulsfo.wmnet
  • 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4050.ulsfo.wmnet
  • 10:38 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4050.ulsfo.wmnet
  • 10:26 moritzm: installing gtk+3.0 bug fix updates from Bookworm point release
  • 08:49 godog: test rsyslog version from bullseye-backports on centrallog - T351710
  • 08:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54264 and previous config saved to /var/cache/conftool/dbconfig/20231206-084928-arnaudb.json
  • 08:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P54263 and previous config saved to /var/cache/conftool/dbconfig/20231206-083422-arnaudb.json
  • 08:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P54262 and previous config saved to /var/cache/conftool/dbconfig/20231206-081915-arnaudb.json
  • 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4047.ulsfo.wmnet
  • 08:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54261 and previous config saved to /var/cache/conftool/dbconfig/20231206-080409-arnaudb.json
  • 07:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4047.ulsfo.wmnet
  • 07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54260 and previous config saved to /var/cache/conftool/dbconfig/20231206-075333-arnaudb.json
  • 07:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 07:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T348183)', diff saved to https://phabricator.wikimedia.org/P54259 and previous config saved to /var/cache/conftool/dbconfig/20231206-075309-arnaudb.json
  • 07:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P54258 and previous config saved to /var/cache/conftool/dbconfig/20231206-073803-arnaudb.json
  • 07:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P54257 and previous config saved to /var/cache/conftool/dbconfig/20231206-072256-arnaudb.json
  • 07:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T348183)', diff saved to https://phabricator.wikimedia.org/P54256 and previous config saved to /var/cache/conftool/dbconfig/20231206-070749-arnaudb.json
  • 06:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T348183)', diff saved to https://phabricator.wikimedia.org/P54255 and previous config saved to /var/cache/conftool/dbconfig/20231206-062922-arnaudb.json
  • 06:29 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 06:29 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 06:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T348183)', diff saved to https://phabricator.wikimedia.org/P54254 and previous config saved to /var/cache/conftool/dbconfig/20231206-062859-arnaudb.json
  • 06:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P54252 and previous config saved to /var/cache/conftool/dbconfig/20231206-061352-arnaudb.json
  • 05:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P54251 and previous config saved to /var/cache/conftool/dbconfig/20231206-055846-arnaudb.json
  • 05:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T348183)', diff saved to https://phabricator.wikimedia.org/P54250 and previous config saved to /var/cache/conftool/dbconfig/20231206-054339-arnaudb.json
  • 05:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T348183)', diff saved to https://phabricator.wikimedia.org/P54249 and previous config saved to /var/cache/conftool/dbconfig/20231206-053321-arnaudb.json
  • 05:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 05:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 05:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T348183)', diff saved to https://phabricator.wikimedia.org/P54248 and previous config saved to /var/cache/conftool/dbconfig/20231206-053256-arnaudb.json
  • 05:19 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade T351616 (duration: 00m 09s)
  • 05:19 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade T351616
  • 05:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P54247 and previous config saved to /var/cache/conftool/dbconfig/20231206-051750-arnaudb.json
  • 05:09 ejegg: fundraising civicrm upgraded from 6bb8a67f to 6ca683b2
  • 05:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P54246 and previous config saved to /var/cache/conftool/dbconfig/20231206-050243-arnaudb.json
  • 04:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T348183)', diff saved to https://phabricator.wikimedia.org/P54245 and previous config saved to /var/cache/conftool/dbconfig/20231206-044737-arnaudb.json
  • 04:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T348183)', diff saved to https://phabricator.wikimedia.org/P54244 and previous config saved to /var/cache/conftool/dbconfig/20231206-043718-arnaudb.json
  • 04:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 04:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 04:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 04:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 04:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54243 and previous config saved to /var/cache/conftool/dbconfig/20231206-043638-arnaudb.json
  • 04:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P54242 and previous config saved to /var/cache/conftool/dbconfig/20231206-042132-arnaudb.json
  • 04:14 ejegg: standalone (payments listener) SmashPig upgraded from f24afba3 to 83d509ed
  • 04:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P54241 and previous config saved to /var/cache/conftool/dbconfig/20231206-040625-arnaudb.json
  • 03:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54240 and previous config saved to /var/cache/conftool/dbconfig/20231206-035119-arnaudb.json
  • 03:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54239 and previous config saved to /var/cache/conftool/dbconfig/20231206-034045-arnaudb.json
  • 03:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 03:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 03:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54238 and previous config saved to /var/cache/conftool/dbconfig/20231206-034022-arnaudb.json
  • 03:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P54237 and previous config saved to /var/cache/conftool/dbconfig/20231206-032516-arnaudb.json
  • 03:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P54236 and previous config saved to /var/cache/conftool/dbconfig/20231206-031009-arnaudb.json
  • 02:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54235 and previous config saved to /var/cache/conftool/dbconfig/20231206-025503-arnaudb.json
  • 02:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54234 and previous config saved to /var/cache/conftool/dbconfig/20231206-024108-arnaudb.json
  • 02:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 02:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 02:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T348183)', diff saved to https://phabricator.wikimedia.org/P54233 and previous config saved to /var/cache/conftool/dbconfig/20231206-024045-arnaudb.json
  • 02:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P54232 and previous config saved to /var/cache/conftool/dbconfig/20231206-022538-arnaudb.json
  • 02:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P54231 and previous config saved to /var/cache/conftool/dbconfig/20231206-021031-arnaudb.json
  • 02:08 eileen: civicrm upgraded from 7fb98ee8 to 6bb8a67f
  • 02:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:00 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T348183)', diff saved to https://phabricator.wikimedia.org/P54230 and previous config saved to /var/cache/conftool/dbconfig/20231206-015519-arnaudb.json
  • 01:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:51 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T348183)', diff saved to https://phabricator.wikimedia.org/P54229 and previous config saved to /var/cache/conftool/dbconfig/20231206-014506-arnaudb.json
  • 01:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 01:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 01:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T348183)', diff saved to https://phabricator.wikimedia.org/P54228 and previous config saved to /var/cache/conftool/dbconfig/20231206-014443-arnaudb.json
  • 01:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
  • 01:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
  • 01:40 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P54227 and previous config saved to /var/cache/conftool/dbconfig/20231206-012936-arnaudb.json
  • 01:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2003.codfw.wmnet with OS bullseye
  • 01:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
  • 01:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
  • 01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:21 eileen: civicrm upgraded from d8238788 to 7fb98ee8
  • 01:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:20 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
  • 01:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P54226 and previous config saved to /var/cache/conftool/dbconfig/20231206-011430-arnaudb.json
  • 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
  • 01:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
  • 01:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
  • 01:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
  • 01:03 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 00:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T348183)', diff saved to https://phabricator.wikimedia.org/P54225 and previous config saved to /var/cache/conftool/dbconfig/20231206-005923-arnaudb.json
  • 00:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T348183)', diff saved to https://phabricator.wikimedia.org/P54224 and previous config saved to /var/cache/conftool/dbconfig/20231206-004820-arnaudb.json
  • 00:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 00:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 00:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T348183)', diff saved to https://phabricator.wikimedia.org/P54223 and previous config saved to /var/cache/conftool/dbconfig/20231206-004756-arnaudb.json
  • 00:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P54222 and previous config saved to /var/cache/conftool/dbconfig/20231206-003249-arnaudb.json
  • 00:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P54221 and previous config saved to /var/cache/conftool/dbconfig/20231206-001742-arnaudb.json
  • 00:17 ejegg: civicrm upgraded from 297a091d to d8238788
  • 00:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T348183)', diff saved to https://phabricator.wikimedia.org/P54220 and previous config saved to /var/cache/conftool/dbconfig/20231206-000236-arnaudb.json

2023-12-05

  • 23:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T348183)', diff saved to https://phabricator.wikimedia.org/P54219 and previous config saved to /var/cache/conftool/dbconfig/20231205-235213-arnaudb.json
  • 23:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 23:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 23:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 23:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 23:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T348183)', diff saved to https://phabricator.wikimedia.org/P54218 and previous config saved to /var/cache/conftool/dbconfig/20231205-234425-arnaudb.json
  • 23:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P54217 and previous config saved to /var/cache/conftool/dbconfig/20231205-232918-arnaudb.json
  • 23:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P54216 and previous config saved to /var/cache/conftool/dbconfig/20231205-231412-arnaudb.json
  • 22:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T348183)', diff saved to https://phabricator.wikimedia.org/P54215 and previous config saved to /var/cache/conftool/dbconfig/20231205-225905-arnaudb.json
  • 22:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T348183)', diff saved to https://phabricator.wikimedia.org/P54214 and previous config saved to /var/cache/conftool/dbconfig/20231205-224838-arnaudb.json
  • 22:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 22:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 22:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T348183)', diff saved to https://phabricator.wikimedia.org/P54213 and previous config saved to /var/cache/conftool/dbconfig/20231205-224816-arnaudb.json
  • 22:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P54212 and previous config saved to /var/cache/conftool/dbconfig/20231205-223309-arnaudb.json
  • 22:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P54211 and previous config saved to /var/cache/conftool/dbconfig/20231205-221803-arnaudb.json
  • 22:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T348183)', diff saved to https://phabricator.wikimedia.org/P54210 and previous config saved to /var/cache/conftool/dbconfig/20231205-220256-arnaudb.json
  • 22:01 jforrester@deploy2002: Finished scap: Backport for Define the corresponding stream for scroll (T350883), Add stream config for *webuiactions via Metrics Platform (T351298) (duration: 19m 01s)
  • 21:53 jforrester@deploy2002: ksarabia and jforrester and cjming: Continuing with sync
  • 21:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T348183)', diff saved to https://phabricator.wikimedia.org/P54209 and previous config saved to /var/cache/conftool/dbconfig/20231205-215135-arnaudb.json
  • 21:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 21:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 21:43 jforrester@deploy2002: ksarabia and jforrester and cjming: Backport for Define the corresponding stream for scroll (T350883), Add stream config for *webuiactions via Metrics Platform (T351298) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 21:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 21:42 jforrester@deploy2002: Started scap: Backport for Define the corresponding stream for scroll (T350883), Add stream config for *webuiactions via Metrics Platform (T351298)
  • 21:40 jforrester@deploy2002: Finished scap: Backport for [Zebra] Make .vector-column-start cache compatible (T347712 T351830), Fix nonzebra sticky container scrolling behavior and scrollable indicator (T352464) (duration: 12m 50s)
  • 21:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 21:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 21:34 jforrester@deploy2002: jdlrobson and jforrester and jdrewniak: Continuing with sync
  • 21:30 jforrester@deploy2002: jdlrobson and jforrester and jdrewniak: Backport for [Zebra] Make .vector-column-start cache compatible (T347712 T351830), Fix nonzebra sticky container scrolling behavior and scrollable indicator (T352464) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:27 jforrester@deploy2002: Started scap: Backport for [Zebra] Make .vector-column-start cache compatible (T347712 T351830), Fix nonzebra sticky container scrolling behavior and scrollable indicator (T352464)
  • 21:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 21:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 21:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T348183)', diff saved to https://phabricator.wikimedia.org/P54208 and previous config saved to /var/cache/conftool/dbconfig/20231205-212707-arnaudb.json
  • 21:27 jforrester@deploy2002: Finished scap: Backport for Deploy VectorClientPreferences to beta on pl,fr,ca,fa,tr wikis (T351339) (duration: 13m 44s)
  • 21:19 jforrester@deploy2002: bwang and jforrester: Continuing with sync
  • 21:13 jforrester@deploy2002: Started scap: Backport for Deploy VectorClientPreferences to beta on pl,fr,ca,fa,tr wikis (T351339)
  • 21:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P54207 and previous config saved to /var/cache/conftool/dbconfig/20231205-211200-arnaudb.json
  • 21:11 jforrester@deploy2002: Finished scap: Backport for Revert "Do not try to use Thumbor on beta" (T344605), nlwikivoyage: Drop Listings extension (T352696), Drop Listings extension from Wikivoyages where unused (T352719) (duration: 08m 45s)
  • 21:04 jforrester@deploy2002: tgr and jforrester: Continuing with sync
  • 21:04 jforrester@deploy2002: tgr and jforrester: Backport for Revert "Do not try to use Thumbor on beta" (T344605), nlwikivoyage: Drop Listings extension (T352696), Drop Listings extension from Wikivoyages where unused (T352719) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:02 jforrester@deploy2002: Started scap: Backport for Revert "Do not try to use Thumbor on beta" (T344605), nlwikivoyage: Drop Listings extension (T352696), Drop Listings extension from Wikivoyages where unused (T352719)
  • 20:58 inflatador: bking@prometheus1006 disable puppet for troubleshooting T347355
  • 20:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P54206 and previous config saved to /var/cache/conftool/dbconfig/20231205-205654-arnaudb.json
  • 20:53 inflatador: bking@prometheus1006 reload prometheus-blackbox service T347355
  • 20:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T348183)', diff saved to https://phabricator.wikimedia.org/P54205 and previous config saved to /var/cache/conftool/dbconfig/20231205-204147-arnaudb.json
  • 20:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T348183)', diff saved to https://phabricator.wikimedia.org/P54204 and previous config saved to /var/cache/conftool/dbconfig/20231205-203158-arnaudb.json
  • 20:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 20:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 20:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T348183)', diff saved to https://phabricator.wikimedia.org/P54203 and previous config saved to /var/cache/conftool/dbconfig/20231205-203136-arnaudb.json
  • 20:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P54202 and previous config saved to /var/cache/conftool/dbconfig/20231205-201629-arnaudb.json
  • 20:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P54201 and previous config saved to /var/cache/conftool/dbconfig/20231205-200123-arnaudb.json
  • 19:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T348183)', diff saved to https://phabricator.wikimedia.org/P54200 and previous config saved to /var/cache/conftool/dbconfig/20231205-194616-arnaudb.json
  • 19:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1218 (T348183)', diff saved to https://phabricator.wikimedia.org/P54199 and previous config saved to /var/cache/conftool/dbconfig/20231205-193627-arnaudb.json
  • 19:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 19:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 19:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T348183)', diff saved to https://phabricator.wikimedia.org/P54198 and previous config saved to /var/cache/conftool/dbconfig/20231205-193604-arnaudb.json
  • 19:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P54197 and previous config saved to /var/cache/conftool/dbconfig/20231205-192057-arnaudb.json
  • 19:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P54196 and previous config saved to /var/cache/conftool/dbconfig/20231205-190551-arnaudb.json
  • 18:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T348183)', diff saved to https://phabricator.wikimedia.org/P54195 and previous config saved to /var/cache/conftool/dbconfig/20231205-185044-arnaudb.json
  • 18:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T348183)', diff saved to https://phabricator.wikimedia.org/P54194 and previous config saved to /var/cache/conftool/dbconfig/20231205-184108-arnaudb.json
  • 18:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 18:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 18:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54193 and previous config saved to /var/cache/conftool/dbconfig/20231205-184045-arnaudb.json
  • 18:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54192 and previous config saved to /var/cache/conftool/dbconfig/20231205-182539-arnaudb.json
  • 18:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bullseye
  • 18:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54191 and previous config saved to /var/cache/conftool/dbconfig/20231205-181032-arnaudb.json
  • 17:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54190 and previous config saved to /var/cache/conftool/dbconfig/20231205-175526-arnaudb.json
  • 17:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 17:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 17:46 vgutierrez: rolling restart of text|secondary LVS on drmrs effectively enabling IPIP encapsulation for ncredir@drmrs- T351069
  • 17:29 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:29 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:29 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
  • 17:28 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['testhost2001']
  • 17:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001']
  • 17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['testhost2001']
  • 17:11 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bullseye
  • 17:00 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1002.eqiad.wmnet with OS bookworm
  • 16:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54189 and previous config saved to /var/cache/conftool/dbconfig/20231205-165503-arnaudb.json
  • 16:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 16:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 16:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T348183)', diff saved to https://phabricator.wikimedia.org/P54188 and previous config saved to /var/cache/conftool/dbconfig/20231205-165439-arnaudb.json
  • 16:52 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
  • 16:52 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bullseye
  • 16:47 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
  • 16:42 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001']
  • 16:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host testhost2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P54187 and previous config saved to /var/cache/conftool/dbconfig/20231205-163933-arnaudb.json
  • 16:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
  • 16:34 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
  • 16:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P54186 and previous config saved to /var/cache/conftool/dbconfig/20231205-162426-arnaudb.json
  • 16:24 claime: Rolling back k8s-ingress-dse - restarting pybal on lvs1019 - T352639
  • 16:18 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:18 claime: Rolling back k8s-ingress-dse - restarting pybal on lvs1020 - T352639
  • 16:18 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:18 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:14 samtar@deploy2002: Finished scap: Backport for .well-known: Add F-Droid signature to assetlinks.json (T346951) (duration: 07m 53s)
  • 16:11 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
  • 16:09 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
  • 16:09 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 16:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T348183)', diff saved to https://phabricator.wikimedia.org/P54185 and previous config saved to /var/cache/conftool/dbconfig/20231205-160920-arnaudb.json
  • 16:09 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 16:08 samtar@deploy2002: samtar: Continuing with sync
  • 16:08 samtar@deploy2002: samtar: Backport for .well-known: Add F-Droid signature to assetlinks.json (T346951) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:07 samtar@deploy2002: Started scap: Backport for .well-known: Add F-Droid signature to assetlinks.json (T346951)
  • 16:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host testhost2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding testhost2001 to codfw - jhancock@cumin2002"
  • 15:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding testhost2001 to codfw - jhancock@cumin2002"
  • 15:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T348183)', diff saved to https://phabricator.wikimedia.org/P54184 and previous config saved to /var/cache/conftool/dbconfig/20231205-155858-arnaudb.json
  • 15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 15:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 15:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T348183)', diff saved to https://phabricator.wikimedia.org/P54183 and previous config saved to /var/cache/conftool/dbconfig/20231205-155814-arnaudb.json
  • 15:57 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:56 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:56 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 15:56 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:56 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 15:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4040.ulsfo.wmnet
  • 15:49 claime: sudo confctl select "service=kubesvc,cluster=dse-k8s" set/pooled=inactive - T352639
  • 15:45 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4040.ulsfo.wmnet
  • 15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P54182 and previous config saved to /var/cache/conftool/dbconfig/20231205-154308-arnaudb.json
  • 15:42 moritzm: installing monitoring-plugins bugfix updates from Bookworm point release
  • 15:42 claime: Manually restarting pybal on lvs1020 - T352639
  • 15:39 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
  • 15:31 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1471.eqiad.wmnet with OS bullseye
  • 15:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2005']
  • 15:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2005']
  • 15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2006']
  • 15:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2006']
  • 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P54181 and previous config saved to /var/cache/conftool/dbconfig/20231205-152801-arnaudb.json
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host aqs2001.codfw.wmnet
  • 15:22 claime: Manually restarting pybal on lvs1019 - T352639
  • 15:21 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:20 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:18 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:16 claime: Manually restarting pybal on lvs1020 - T352639
  • 15:15 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host aqs2001.codfw.wmnet
  • 15:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:13 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1471.eqiad.wmnet with reason: host reimage
  • 15:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T348183)', diff saved to https://phabricator.wikimedia.org/P54180 and previous config saved to /var/cache/conftool/dbconfig/20231205-151255-arnaudb.json
  • 15:12 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 15:11 cgoubert@cumin1001: END (FAIL) - Cookbook sre.loadbalancer.restart-pybal (exit_code=1) rolling-restart of pybal on P{lvs[1018,1020].eqiad.wmnet} and A:lvs (T352639)
  • 15:11 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 15:10 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1471.eqiad.wmnet with reason: host reimage
  • 15:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:06 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 15:06 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 15:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2006.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4043.ulsfo.wmnet
  • 15:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T348183)', diff saved to https://phabricator.wikimedia.org/P54179 and previous config saved to /var/cache/conftool/dbconfig/20231205-150243-arnaudb.json
  • 15:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 15:02 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 15:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T348183)', diff saved to https://phabricator.wikimedia.org/P54178 and previous config saved to /var/cache/conftool/dbconfig/20231205-150220-arnaudb.json
  • 15:01 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[1018,1020].eqiad.wmnet} and A:lvs (T352639)
  • 14:58 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
  • 14:58 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1471.eqiad.wmnet with OS bullseye
  • 14:57 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
  • 14:57 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 14:57 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2006.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:54 brouberol: adding k8s-ingress-dse backend to LVS - T352639
  • 14:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4043.ulsfo.wmnet
  • 14:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P54177 and previous config saved to /var/cache/conftool/dbconfig/20231205-144714-arnaudb.json
  • 14:45 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
  • 14:45 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
  • 14:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sessionstore2004-6 to codfw - jhancock@cumin2002"
  • 14:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sessionstore2004-6 to codfw - jhancock@cumin2002"
  • 14:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:41 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:41 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:40 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:40 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: redis::misc::master
  • 14:38 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2002']
  • 14:35 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:32 urbanecm@deploy2002: Finished scap: Backport for User impact: update quantizeViews to process small series of view data (T352349), Add maintenance script to import existing files to scan table (T350863), Only allow drawing and bitmap media types to be scanned (T352234) (duration: 08m 55s)
  • 14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P54176 and previous config saved to /var/cache/conftool/dbconfig/20231205-143207-arnaudb.json
  • 14:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: redis::misc::master
  • 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
  • 14:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:26 urbanecm@deploy2002: kharlan and urbanecm: Continuing with sync
  • 14:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:25 urbanecm@deploy2002: kharlan and urbanecm: Backport for User impact: update quantizeViews to process small series of view data (T352349), Add maintenance script to import existing files to scan table (T350863), Only allow drawing and bitmap media types to be scanned (T352234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:23 urbanecm@deploy2002: Started scap: Backport for User impact: update quantizeViews to process small series of view data (T352349), Add maintenance script to import existing files to scan table (T350863), Only allow drawing and bitmap media types to be scanned (T352234)
  • 14:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T348183)', diff saved to https://phabricator.wikimedia.org/P54175 and previous config saved to /var/cache/conftool/dbconfig/20231205-141701-arnaudb.json
  • 14:13 urbanecm@deploy2002: Finished scap: Backport for Growth: Enable Welcome survey user research for ar/en/es (T351266) (duration: 09m 33s)
  • 14:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T348183)', diff saved to https://phabricator.wikimedia.org/P54174 and previous config saved to /var/cache/conftool/dbconfig/20231205-140742-arnaudb.json
  • 14:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 14:07 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 14:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T348183)', diff saved to https://phabricator.wikimedia.org/P54173 and previous config saved to /var/cache/conftool/dbconfig/20231205-140720-arnaudb.json
  • 14:06 urbanecm@deploy2002: urbanecm: Backport for Growth: Enable Welcome survey user research for ar/en/es (T351266) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:06 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
  • 14:05 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
  • 14:04 urbanecm@deploy2002: Started scap: Backport for Growth: Enable Welcome survey user research for ar/en/es (T351266)
  • 14:03 moritzm: installing cups security updates
  • 13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P54172 and previous config saved to /var/cache/conftool/dbconfig/20231205-135213-arnaudb.json
  • 13:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4048.ulsfo.wmnet
  • 13:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1078.eqiad.wmnet with OS bullseye
  • 13:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 13:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 13:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1079.eqiad.wmnet with OS bullseye
  • 13:48 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 13:48 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1470.eqiad.wmnet with OS bullseye
  • 13:44 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 13:43 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1465.eqiad.wmnet with OS bullseye
  • 13:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4048.ulsfo.wmnet
  • 13:38 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1464.eqiad.wmnet with OS bullseye
  • 13:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P54171 and previous config saved to /var/cache/conftool/dbconfig/20231205-133706-arnaudb.json
  • 13:30 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1470.eqiad.wmnet with reason: host reimage
  • 13:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1078.eqiad.wmnet with reason: host reimage
  • 13:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1076.eqiad.wmnet with OS bullseye
  • 13:27 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 13:26 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1470.eqiad.wmnet with reason: host reimage
  • 13:26 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 13:24 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1465.eqiad.wmnet with reason: host reimage
  • 13:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1079.eqiad.wmnet with reason: host reimage
  • 13:24 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1078.eqiad.wmnet with reason: host reimage
  • 13:23 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1079.eqiad.wmnet with reason: host reimage
  • 13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T348183)', diff saved to https://phabricator.wikimedia.org/P54169 and previous config saved to /var/cache/conftool/dbconfig/20231205-132200-arnaudb.json
  • 13:21 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1465.eqiad.wmnet with reason: host reimage
  • 13:21 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1464.eqiad.wmnet with reason: host reimage
  • 13:18 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1464.eqiad.wmnet with reason: host reimage
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: redis::misc::slave
  • 13:14 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1470.eqiad.wmnet with OS bullseye
  • 13:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T348183)', diff saved to https://phabricator.wikimedia.org/P54168 and previous config saved to /var/cache/conftool/dbconfig/20231205-131240-arnaudb.json
  • 13:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 13:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 13:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
  • 13:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
  • 13:08 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1465.eqiad.wmnet with OS bullseye
  • 13:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1076.eqiad.wmnet with reason: host reimage
  • 13:06 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2435.codfw.wmnet with OS bullseye
  • 13:06 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1464.eqiad.wmnet with OS bullseye
  • 13:04 cmooney@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:04 cmooney@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update entry for sretest2003. - cmooney@cumin2002"
  • 13:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 13:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 13:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1076.eqiad.wmnet with reason: host reimage
  • 13:04 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:04 cmooney@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update entry for sretest2003. - cmooney@cumin2002"
  • 13:03 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:02 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1463.eqiad.wmnet with OS bullseye
  • 12:59 cmooney@cumin2002: START - Cookbook sre.dns.netbox
  • 12:58 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2434.codfw.wmnet with OS bullseye
  • 12:57 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: redis::misc::slave
  • 12:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T348183)', diff saved to https://phabricator.wikimedia.org/P54167 and previous config saved to /var/cache/conftool/dbconfig/20231205-125641-arnaudb.json
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4042.ulsfo.wmnet
  • 12:50 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2424.codfw.wmnet with OS bullseye
  • 12:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
  • 12:47 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2435.codfw.wmnet with reason: host reimage
  • 12:47 ladsgroup@deploy2002: Finished scap: Backport for Set migration of pagelinks on large wikis of s5 to read new (T351237) (duration: 12m 30s)
  • 12:45 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:45 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2423.codfw.wmnet with OS bullseye
  • 12:45 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:44 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1463.eqiad.wmnet with reason: host reimage
  • 12:42 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2435.codfw.wmnet with reason: host reimage
  • 12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P54165 and previous config saved to /var/cache/conftool/dbconfig/20231205-124134-arnaudb.json
  • 12:41 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1463.eqiad.wmnet with reason: host reimage
  • 12:40 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:39 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2434.codfw.wmnet with reason: host reimage
  • 12:37 ladsgroup@deploy2002: ladsgroup: Backport for Set migration of pagelinks on large wikis of s5 to read new (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:36 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2434.codfw.wmnet with reason: host reimage
  • 12:34 ladsgroup@deploy2002: Started scap: Backport for Set migration of pagelinks on large wikis of s5 to read new (T351237)
  • 12:32 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4042.ulsfo.wmnet
  • 12:31 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2424.codfw.wmnet with reason: host reimage
  • 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4051.ulsfo.wmnet
  • 12:28 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1463.eqiad.wmnet with OS bullseye
  • 12:28 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2424.codfw.wmnet with reason: host reimage
  • 12:27 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:26 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2423.codfw.wmnet with reason: host reimage
  • 12:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P54164 and previous config saved to /var/cache/conftool/dbconfig/20231205-122628-arnaudb.json
  • 12:26 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:25 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2435.codfw.wmnet with OS bullseye
  • 12:24 moritzm: installing unbound bugfix updates from Bookworm point release
  • 12:23 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2423.codfw.wmnet with reason: host reimage
  • 12:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4051.ulsfo.wmnet
  • 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4039.ulsfo.wmnet
  • 12:18 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2434.codfw.wmnet with OS bullseye
  • 12:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T348183)', diff saved to https://phabricator.wikimedia.org/P54163 and previous config saved to /var/cache/conftool/dbconfig/20231205-121121-arnaudb.json
  • 12:10 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2424.codfw.wmnet with OS bullseye
  • 12:07 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:07 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:06 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2423.codfw.wmnet with OS bullseye
  • 12:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4039.ulsfo.wmnet
  • 12:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T348183)', diff saved to https://phabricator.wikimedia.org/P54162 and previous config saved to /var/cache/conftool/dbconfig/20231205-120206-arnaudb.json
  • 12:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T348183)', diff saved to https://phabricator.wikimedia.org/P54161 and previous config saved to /var/cache/conftool/dbconfig/20231205-120145-arnaudb.json
  • 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4049.ulsfo.wmnet
  • 11:53 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 11:52 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 11:51 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:51 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:50 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4049.ulsfo.wmnet
  • 11:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P54160 and previous config saved to /var/cache/conftool/dbconfig/20231205-114638-arnaudb.json
  • 11:40 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:40 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:40 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:40 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:38 ladsgroup@deploy2002: Finished scap: Backport for Bump ParserCache TTL back to 30 days (T280604) (duration: 07m 47s)
  • 11:33 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:32 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:32 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 11:32 ladsgroup@deploy2002: ladsgroup: Backport for Bump ParserCache TTL back to 30 days (T280604) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P54159 and previous config saved to /var/cache/conftool/dbconfig/20231205-113132-arnaudb.json
  • 11:30 ladsgroup@deploy2002: Started scap: Backport for Bump ParserCache TTL back to 30 days (T280604)
  • 11:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1023.eqiad.wmnet with OS bookworm
  • 11:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:16 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T348183)', diff saved to https://phabricator.wikimedia.org/P54158 and previous config saved to /var/cache/conftool/dbconfig/20231205-111625-arnaudb.json
  • 11:16 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:15 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
  • 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
  • 11:08 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:08 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:07 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:07 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T348183)', diff saved to https://phabricator.wikimedia.org/P54157 and previous config saved to /var/cache/conftool/dbconfig/20231205-110448-arnaudb.json
  • 11:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 11:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T348183)', diff saved to https://phabricator.wikimedia.org/P54156 and previous config saved to /var/cache/conftool/dbconfig/20231205-110426-arnaudb.json
  • 11:02 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be1002.eqiad.wmnet with OS bookworm
  • 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bookworm
  • 10:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P54155 and previous config saved to /var/cache/conftool/dbconfig/20231205-104919-arnaudb.json
  • 10:45 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P54154 and previous config saved to /var/cache/conftool/dbconfig/20231205-103413-arnaudb.json
  • 10:21 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
  • 10:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
  • 10:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T348183)', diff saved to https://phabricator.wikimedia.org/P54153 and previous config saved to /var/cache/conftool/dbconfig/20231205-101906-arnaudb.json
  • 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T348183)', diff saved to https://phabricator.wikimedia.org/P54152 and previous config saved to /var/cache/conftool/dbconfig/20231205-100744-arnaudb.json
  • 10:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 10:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T348183)', diff saved to https://phabricator.wikimedia.org/P54151 and previous config saved to /var/cache/conftool/dbconfig/20231205-100722-arnaudb.json
  • 10:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15305
  • 10:02 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 10:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15305
  • 09:57 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 63927
  • 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P54150 and previous config saved to /var/cache/conftool/dbconfig/20231205-095215-arnaudb.json
  • 09:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 63927
  • 09:42 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
  • 09:37 brouberol: running authdns-update on dns1004.wikimedia.org - T352639
  • 09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P54149 and previous config saved to /var/cache/conftool/dbconfig/20231205-093709-arnaudb.json
  • 09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T348183)', diff saved to https://phabricator.wikimedia.org/P54148 and previous config saved to /var/cache/conftool/dbconfig/20231205-092202-arnaudb.json
  • 09:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T348183)', diff saved to https://phabricator.wikimedia.org/P54147 and previous config saved to /var/cache/conftool/dbconfig/20231205-091232-arnaudb.json
  • 09:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 09:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 58952
  • 09:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 58952
  • 09:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 09:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:59 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:26 marostegui: Failover m2-master dbproxy1023.eqiad.wmnet -> dbproxy1025.eqiad.wmnet T351864
  • 06:55 vgutierrez: rolling restart of text|secondary LVS on eqsin effectively enabling IPIP encapsulation for ncredir@eqsin - T351069
  • 06:23 marostegui: Failover m5 from db1119 to db1176 - T352631
  • 06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352631
  • 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352631
  • 01:18 mutante: LDAP - added user xqt to group nda (T348520)
  • 01:12 ejegg: payments-wiki upgraded from 5284fc99 to 1d24dc90
  • 00:06 eevans@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host restbase2028.codfw.wmnet

2023-12-04

  • 23:53 eevans@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host restbase2028.codfw.wmnet
  • 23:52 eevans@cumin1001: START - Cookbook sre.puppet.migrate-host for host restbase2028.codfw.wmnet
  • 22:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T348183)', diff saved to https://phabricator.wikimedia.org/P54146 and previous config saved to /var/cache/conftool/dbconfig/20231204-225336-arnaudb.json
  • 22:53 eileen: civicrm upgraded from 83816165 to 297a091d
  • 22:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P54145 and previous config saved to /var/cache/conftool/dbconfig/20231204-223830-arnaudb.json
  • 22:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P54144 and previous config saved to /var/cache/conftool/dbconfig/20231204-222323-arnaudb.json
  • 22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T348183)', diff saved to https://phabricator.wikimedia.org/P54142 and previous config saved to /var/cache/conftool/dbconfig/20231204-220817-arnaudb.json
  • 22:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2189 (T348183)', diff saved to https://phabricator.wikimedia.org/P54141 and previous config saved to /var/cache/conftool/dbconfig/20231204-220345-arnaudb.json
  • 22:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 22:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 22:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T348183)', diff saved to https://phabricator.wikimedia.org/P54140 and previous config saved to /var/cache/conftool/dbconfig/20231204-220322-arnaudb.json
  • 21:58 ebernhardson@deploy2002: Finished scap: Backport for Always load transcode state from db when opting in to primary db (duration: 08m 37s)
  • 21:52 ebernhardson@deploy2002: ebernhardson and brion: Continuing with sync
  • 21:51 ebernhardson@deploy2002: ebernhardson and brion: Backport for Always load transcode state from db when opting in to primary db synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:50 ebernhardson@deploy2002: Started scap: Backport for Always load transcode state from db when opting in to primary db
  • 21:49 ebernhardson@deploy2002: Finished scap: Backport for cirrus: Enable event bus bridge on more wikis (T352335) (duration: 09m 23s)
  • 21:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P54138 and previous config saved to /var/cache/conftool/dbconfig/20231204-214816-arnaudb.json
  • 21:47 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main2001:~$ kafka topics --alter --topic codfw.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
  • 21:47 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main2001:~$ kafka topics --alter --topic eqiad.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
  • 21:42 ebernhardson@deploy2002: ebernhardson: Continuing with sync
  • 21:41 ebernhardson@deploy2002: ebernhardson: Backport for cirrus: Enable event bus bridge on more wikis (T352335) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:39 ebernhardson@deploy2002: Started scap: Backport for cirrus: Enable event bus bridge on more wikis (T352335)
  • 21:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P54137 and previous config saved to /var/cache/conftool/dbconfig/20231204-213309-arnaudb.json
  • 21:27 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:27 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:19 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1077.eqiad.wmnet with OS bullseye
  • 21:19 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 21:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T348183)', diff saved to https://phabricator.wikimedia.org/P54136 and previous config saved to /var/cache/conftool/dbconfig/20231204-211803-arnaudb.json
  • 21:14 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 21:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T348183)', diff saved to https://phabricator.wikimedia.org/P54135 and previous config saved to /var/cache/conftool/dbconfig/20231204-211305-arnaudb.json
  • 21:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 21:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 21:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54134 and previous config saved to /var/cache/conftool/dbconfig/20231204-211241-arnaudb.json
  • 21:09 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main1001:~$ kafka topics --alter --topic codfw.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
  • 21:06 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main1001:~$ kafka topics --alter --topic eqiad.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
  • 20:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P54133 and previous config saved to /var/cache/conftool/dbconfig/20231204-205735-arnaudb.json
  • 20:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1077.eqiad.wmnet with reason: host reimage
  • 20:50 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1077.eqiad.wmnet with reason: host reimage
  • 20:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P54132 and previous config saved to /var/cache/conftool/dbconfig/20231204-204228-arnaudb.json
  • 20:36 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
  • 20:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54131 and previous config saved to /var/cache/conftool/dbconfig/20231204-202722-arnaudb.json
  • 19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1079.eqiad.wmnet with OS bullseye
  • 19:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1076.eqiad.wmnet with OS bullseye
  • 19:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1078.eqiad.wmnet with OS bullseye
  • 19:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1077.eqiad.wmnet with OS bullseye
  • 19:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54130 and previous config saved to /var/cache/conftool/dbconfig/20231204-194103-arnaudb.json
  • 19:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T348183)', diff saved to https://phabricator.wikimedia.org/P54129 and previous config saved to /var/cache/conftool/dbconfig/20231204-194039-arnaudb.json
  • 19:37 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:37 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P54128 and previous config saved to /var/cache/conftool/dbconfig/20231204-192532-arnaudb.json
  • 19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
  • 19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
  • 19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
  • 19:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
  • 19:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P54126 and previous config saved to /var/cache/conftool/dbconfig/20231204-191026-arnaudb.json
  • 19:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1079.eqiad.wmnet with OS bullseye
  • 19:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1078.eqiad.wmnet with OS bullseye
  • 19:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1077.eqiad.wmnet with OS bullseye
  • 18:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T348183)', diff saved to https://phabricator.wikimedia.org/P54125 and previous config saved to /var/cache/conftool/dbconfig/20231204-185519-arnaudb.json
  • 18:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1076.eqiad.wmnet with OS bullseye
  • 18:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
  • 18:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
  • 18:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
  • 18:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T348183)', diff saved to https://phabricator.wikimedia.org/P54124 and previous config saved to /var/cache/conftool/dbconfig/20231204-184630-arnaudb.json
  • 18:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54123 and previous config saved to /var/cache/conftool/dbconfig/20231204-184607-arnaudb.json
  • 18:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P54122 and previous config saved to /var/cache/conftool/dbconfig/20231204-183100-arnaudb.json
  • 18:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P54121 and previous config saved to /var/cache/conftool/dbconfig/20231204-181554-arnaudb.json
  • 18:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1077.eqiad.wmnet with OS bullseye
  • 18:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54120 and previous config saved to /var/cache/conftool/dbconfig/20231204-180047-arnaudb.json
  • 17:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
  • 17:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1078.eqiad.wmnet with OS bullseye
  • 17:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54119 and previous config saved to /var/cache/conftool/dbconfig/20231204-175448-arnaudb.json
  • 17:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 17:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 17:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T348183)', diff saved to https://phabricator.wikimedia.org/P54118 and previous config saved to /var/cache/conftool/dbconfig/20231204-175426-arnaudb.json
  • 17:41 ladsgroup@deploy2002: Finished scap: Backport for Category: Stop locking thousands of rows (T352628) (duration: 08m 07s)
  • 17:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P54117 and previous config saved to /var/cache/conftool/dbconfig/20231204-173919-arnaudb.json
  • 17:35 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 17:34 ladsgroup@deploy2002: ladsgroup: Backport for Category: Stop locking thousands of rows (T352628) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:33 ladsgroup@deploy2002: Started scap: Backport for Category: Stop locking thousands of rows (T352628)
  • 17:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P54116 and previous config saved to /var/cache/conftool/dbconfig/20231204-172413-arnaudb.json
  • 17:19 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1076']
  • 17:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
  • 17:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1079']
  • 17:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
  • 17:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1079']
  • 17:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
  • 17:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1079']
  • 17:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
  • 17:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1079']
  • 17:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
  • 17:14 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1079']
  • 17:12 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1076']
  • 17:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
  • 17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
  • 17:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
  • 17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
  • 17:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T348183)', diff saved to https://phabricator.wikimedia.org/P54115 and previous config saved to /var/cache/conftool/dbconfig/20231204-170906-arnaudb.json
  • 17:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
  • 17:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
  • 17:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T348183)', diff saved to https://phabricator.wikimedia.org/P54114 and previous config saved to /var/cache/conftool/dbconfig/20231204-170604-arnaudb.json
  • 17:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 17:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 17:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 17:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 17:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T348183)', diff saved to https://phabricator.wikimedia.org/P54113 and previous config saved to /var/cache/conftool/dbconfig/20231204-170525-arnaudb.json
  • 16:52 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 45s)
  • 16:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P54112 and previous config saved to /var/cache/conftool/dbconfig/20231204-165018-arnaudb.json
  • 16:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 33604
  • 16:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 33604
  • 16:44 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 40s)
  • 16:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P54111 and previous config saved to /var/cache/conftool/dbconfig/20231204-163511-arnaudb.json
  • 16:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T348183)', diff saved to https://phabricator.wikimedia.org/P54110 and previous config saved to /var/cache/conftool/dbconfig/20231204-162005-arnaudb.json
  • 16:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T348183)', diff saved to https://phabricator.wikimedia.org/P54109 and previous config saved to /var/cache/conftool/dbconfig/20231204-161408-arnaudb.json
  • 16:14 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 16:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 16:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54108 and previous config saved to /var/cache/conftool/dbconfig/20231204-161346-arnaudb.json
  • 15:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54107 and previous config saved to /var/cache/conftool/dbconfig/20231204-155840-arnaudb.json
  • 15:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1076.eqiad.wmnet with OS bullseye
  • 15:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:47 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:47 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:46 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:45 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54105 and previous config saved to /var/cache/conftool/dbconfig/20231204-154333-arnaudb.json
  • 15:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54104 and previous config saved to /var/cache/conftool/dbconfig/20231204-152826-arnaudb.json
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1077']
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1078']
  • 15:03 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
  • 15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1077']
  • 15:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1077']
  • 15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1078']
  • 15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1077']
  • 15:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4046.ulsfo.wmnet
  • 14:51 vgutierrez: upload tcp-mss-clamper 0.4 to apt.wm.o (bookworm)
  • 14:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1077
  • 14:50 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1077
  • 14:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
  • 14:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4046.ulsfo.wmnet
  • 14:46 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:46 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Create new namespaces and namespace aliases for bd.wikimedia.org (T351903) (duration: 11m 48s)
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4038.ulsfo.wmnet
  • 14:43 sukhe: running authdns-update for CR 979976 [revert of T349665]
  • 14:40 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and mdsshakil: Continuing with sync
  • 14:37 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4038.ulsfo.wmnet
  • 14:36 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and mdsshakil: Backport for Create new namespaces and namespace aliases for bd.wikimedia.org (T351903) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:34 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Create new namespaces and namespace aliases for bd.wikimedia.org (T351903)
  • 14:33 sukhe: running authdns-update for T352579
  • 14:32 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Enable read new for event tables migration on testwiki (T341829) (duration: 10m 42s)
  • 14:32 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 14:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54103 and previous config saved to /var/cache/conftool/dbconfig/20231204-142754-arnaudb.json
  • 14:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:25 lucaswerkmeister-wmde@deploy2002: dreamyjazz and lucaswerkmeister-wmde: Continuing with sync
  • 14:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 14:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 14:22 lucaswerkmeister-wmde@deploy2002: dreamyjazz and lucaswerkmeister-wmde: Backport for Enable read new for event tables migration on testwiki (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:21 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Enable read new for event tables migration on testwiki (T341829)
  • 14:21 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 14:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 14:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T348183)', diff saved to https://phabricator.wikimedia.org/P54102 and previous config saved to /var/cache/conftool/dbconfig/20231204-141848-arnaudb.json
  • 14:15 jforrester@deploy2002: Finished scap: Backport for wikifunctionswiki: Disable thumbnail in Vector search (T352532), wikifunctionswiki: Add ability for sysops to manage Functioneer (T352495) (duration: 07m 41s)
  • 14:10 jforrester@deploy2002: jforrester and terasail: Continuing with sync
  • 14:09 jforrester@deploy2002: jforrester and terasail: Backport for wikifunctionswiki: Disable thumbnail in Vector search (T352532), wikifunctionswiki: Add ability for sysops to manage Functioneer (T352495) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:08 jforrester@deploy2002: Started scap: Backport for wikifunctionswiki: Disable thumbnail in Vector search (T352532), wikifunctionswiki: Add ability for sysops to manage Functioneer (T352495)
  • 14:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P54101 and previous config saved to /var/cache/conftool/dbconfig/20231204-140341-arnaudb.json
  • 13:59 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:59 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:58 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:57 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:56 moritzm: installing postgresql-13 security updates
  • 13:52 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:52 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P54100 and previous config saved to /var/cache/conftool/dbconfig/20231204-134835-arnaudb.json
  • 13:43 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 13:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T348183)', diff saved to https://phabricator.wikimedia.org/P54099 and previous config saved to /var/cache/conftool/dbconfig/20231204-133328-arnaudb.json
  • 13:30 moritzm: instaling dbus security updates on buster
  • 13:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T348183)', diff saved to https://phabricator.wikimedia.org/P54098 and previous config saved to /var/cache/conftool/dbconfig/20231204-132859-arnaudb.json
  • 13:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 13:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 13:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T348183)', diff saved to https://phabricator.wikimedia.org/P54097 and previous config saved to /var/cache/conftool/dbconfig/20231204-132836-arnaudb.json
  • 13:22 moritzm: installing libde265 security updates
  • 13:22 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:22 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P54096 and previous config saved to /var/cache/conftool/dbconfig/20231204-131329-arnaudb.json
  • 13:06 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:05 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:05 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P54095 and previous config saved to /var/cache/conftool/dbconfig/20231204-125823-arnaudb.json
  • 12:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T348183)', diff saved to https://phabricator.wikimedia.org/P54094 and previous config saved to /var/cache/conftool/dbconfig/20231204-124316-arnaudb.json
  • 12:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T348183)', diff saved to https://phabricator.wikimedia.org/P54093 and previous config saved to /var/cache/conftool/dbconfig/20231204-124037-arnaudb.json
  • 12:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 12:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 12:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54092 and previous config saved to /var/cache/conftool/dbconfig/20231204-124015-arnaudb.json
  • 12:35 urbanecm@deploy2002: Finished scap: Backport for User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898) (duration: 09m 43s)
  • 12:29 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 12:28 urbanecm@deploy2002: urbanecm: Backport for User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host an-druid1005.eqiad.wmnet
  • 12:25 urbanecm@deploy2002: Started scap: Backport for User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898)
  • 12:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P54091 and previous config saved to /var/cache/conftool/dbconfig/20231204-122508-arnaudb.json
  • 12:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:19 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host an-druid1005.eqiad.wmnet
  • 12:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1027.eqiad.wmnet with OS bookworm
  • 12:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P54090 and previous config saved to /var/cache/conftool/dbconfig/20231204-121002-arnaudb.json
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host druid1011.eqiad.wmnet
  • 12:00 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host druid1011.eqiad.wmnet
  • 11:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
  • 11:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54089 and previous config saved to /var/cache/conftool/dbconfig/20231204-115455-arnaudb.json
  • 11:54 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2422.codfw.wmnet with OS bullseye
  • 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
  • 11:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54088 and previous config saved to /var/cache/conftool/dbconfig/20231204-115217-arnaudb.json
  • 11:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 11:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 11:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T348183)', diff saved to https://phabricator.wikimedia.org/P54087 and previous config saved to /var/cache/conftool/dbconfig/20231204-115154-arnaudb.json
  • 11:51 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1462.eqiad.wmnet with OS bullseye
  • 11:43 elukey@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:43 elukey@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 11:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 44592
  • 11:42 elukey@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 44592
  • 11:42 elukey@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 11:40 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:39 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 11:39 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bookworm
  • 11:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P54086 and previous config saved to /var/cache/conftool/dbconfig/20231204-113648-arnaudb.json
  • 11:36 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2422.codfw.wmnet with reason: host reimage
  • 11:33 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1462.eqiad.wmnet with reason: host reimage
  • 11:32 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2422.codfw.wmnet with reason: host reimage
  • 11:30 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1462.eqiad.wmnet with reason: host reimage
  • 11:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P54085 and previous config saved to /var/cache/conftool/dbconfig/20231204-112141-arnaudb.json
  • 11:17 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1462.eqiad.wmnet with OS bullseye
  • 11:15 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2422.codfw.wmnet with OS bullseye
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: eventschemas::service
  • 11:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T348183)', diff saved to https://phabricator.wikimedia.org/P54084 and previous config saved to /var/cache/conftool/dbconfig/20231204-110635-arnaudb.json
  • 11:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T348183)', diff saved to https://phabricator.wikimedia.org/P54083 and previous config saved to /var/cache/conftool/dbconfig/20231204-110156-arnaudb.json
  • 11:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54082 and previous config saved to /var/cache/conftool/dbconfig/20231204-110134-arnaudb.json
  • 10:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: eventschemas::service
  • 10:51 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:51 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for the k8s-ingress-dse endpoints - btullis@cumin1001"
  • 10:50 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for the k8s-ingress-dse endpoints - btullis@cumin1001"
  • 10:48 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 10:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P54081 and previous config saved to /var/cache/conftool/dbconfig/20231204-104628-arnaudb.json
  • 10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23856
  • 10:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 23856
  • 10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 63927
  • 10:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 63927
  • 10:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31898
  • 10:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31898
  • 10:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58952
  • 10:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58952
  • 10:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 44592
  • 10:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 44592
  • 10:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4800
  • 10:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4800
  • 10:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 33604
  • 10:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 33604
  • 10:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 142505
  • 10:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 142505
  • 10:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398446
  • 10:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398446
  • 10:32 jayme: upgrade istio (buster -> bullseye) on wikikube codfw - T351933
  • 10:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15305
  • 10:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15305
  • 10:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19165
  • 10:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P54080 and previous config saved to /var/cache/conftool/dbconfig/20231204-103121-arnaudb.json
  • 10:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19165
  • 10:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 237
  • 10:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 237
  • 10:28 jayme: pgrade istio (buster -> bullseye) on wikikube eqiad - T351933
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 35 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 10:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 35 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 10:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1022.eqiad.wmnet with OS bookworm
  • 10:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138997
  • 10:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138997
  • 10:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54079 and previous config saved to /var/cache/conftool/dbconfig/20231204-101615-arnaudb.json
  • 10:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54078 and previous config saved to /var/cache/conftool/dbconfig/20231204-101143-arnaudb.json
  • 10:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54077 and previous config saved to /var/cache/conftool/dbconfig/20231204-101120-arnaudb.json
  • 10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1022.eqiad.wmnet with reason: host reimage
  • 09:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1022.eqiad.wmnet with reason: host reimage
  • 09:58 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:57 godog: roll-restart prometheus/k8s to apply size-based retention - T351179
  • 09:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P54076 and previous config saved to /var/cache/conftool/dbconfig/20231204-095614-arnaudb.json
  • 09:49 volans@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P54075 and previous config saved to /var/cache/conftool/dbconfig/20231204-094107-arnaudb.json
  • 09:36 elukey: upgrade istio (buster -> bullseye) on ml-serve-codfw - T351933
  • 09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54074 and previous config saved to /var/cache/conftool/dbconfig/20231204-092600-arnaudb.json
  • 09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54073 and previous config saved to /var/cache/conftool/dbconfig/20231204-092136-arnaudb.json
  • 09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:21 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54072 and previous config saved to /var/cache/conftool/dbconfig/20231204-092054-arnaudb.json
  • 09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P54070 and previous config saved to /var/cache/conftool/dbconfig/20231204-090547-arnaudb.json
  • 08:58 elukey: upgrade istio (buster -> bullseye) on ml-serve-eqiad - T351933
  • 08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P54069 and previous config saved to /var/cache/conftool/dbconfig/20231204-085041-arnaudb.json
  • 08:50 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bookworm
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM moscovium.eqiad.wmnet
  • 08:48 elukey: upgrade istio (buster -> bullseye) on aux-k8s-eqiad - T351933
  • 08:45 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bookworm
  • 08:43 elukey: upgrade istio (buster -> bullseye) on dse-k8s-eqiad - T351933
  • 08:39 urbanecm@deploy2002: Finished scap: Backport for hewikivoyage: add tagline (T351981), azwiki: Enable $wgMinervaEnableSiteNotice (T352621), trwikivoyage: update wordmark (T352329) (duration: 09m 49s)
  • 08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54068 and previous config saved to /var/cache/conftool/dbconfig/20231204-083534-arnaudb.json
  • 08:33 urbanecm@deploy2002: urbanecm and anzx: Continuing with sync
  • 08:31 urbanecm@deploy2002: urbanecm and anzx: Backport for hewikivoyage: add tagline (T351981), azwiki: Enable $wgMinervaEnableSiteNotice (T352621), trwikivoyage: update wordmark (T352329) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54067 and previous config saved to /var/cache/conftool/dbconfig/20231204-083102-arnaudb.json
  • 08:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:29 urbanecm@deploy2002: Started scap: Backport for hewikivoyage: add tagline (T351981), azwiki: Enable $wgMinervaEnableSiteNotice (T352621), trwikivoyage: update wordmark (T352329)
  • 08:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 08:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 08:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T348183)', diff saved to https://phabricator.wikimedia.org/P54066 and previous config saved to /var/cache/conftool/dbconfig/20231204-082758-arnaudb.json
  • 08:25 oblivian@deploy2002: Finished scap: Backport for Add throttle rule for editathon (T352569) (duration: 18m 04s)
  • 08:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM moscovium.eqiad.wmnet
  • 08:23 _joe_: clearing throttle cache for T352569
  • 08:18 oblivian@deploy2002: oblivian: Continuing with sync
  • 08:17 oblivian@deploy2002: oblivian: Backport for Add throttle rule for editathon (T352569) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P54065 and previous config saved to /var/cache/conftool/dbconfig/20231204-081251-arnaudb.json
  • 08:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bookworm
  • 08:10 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bookworm
  • 08:07 oblivian@deploy2002: Started scap: Backport for Add throttle rule for editathon (T352569)
  • 07:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P54064 and previous config saved to /var/cache/conftool/dbconfig/20231204-075745-arnaudb.json
  • 07:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bookworm
  • 07:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T348183)', diff saved to https://phabricator.wikimedia.org/P54063 and previous config saved to /var/cache/conftool/dbconfig/20231204-074238-arnaudb.json
  • 07:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T348183)', diff saved to https://phabricator.wikimedia.org/P54062 and previous config saved to /var/cache/conftool/dbconfig/20231204-073957-arnaudb.json
  • 07:39 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 07:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bookworm
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
  • 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
  • 07:07 kart_: Updated MinT to 2023-11-21-115852-production
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bookworm
  • 06:57 marostegui: Failover m5 from db1176 to db1119 - T332155
  • 06:49 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352505
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352505
  • 06:44 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 06:33 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:28 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:14 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:11 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:08 kart_: Updated cxserver to 2023-12-04-055024-production (T270060, T350773, T352620)
  • 06:06 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:05 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:03 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:02 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:59 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:58 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 04:43 ryankemper: [WDQS] Clearing `BlazegraphFreeAllocatorsDecreasingRapidly` -> `ryankemper@wdqs1007:~$ sudo systemctl restart wdqs-blazegraph`
  • 00:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1006.eqiad.wmnet
  • 00:09 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.eqiad.wmnet

2023-12-02

  • 01:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1078.eqiad.wmnet with OS bullseye
  • 01:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1079.eqiad.wmnet with OS bullseye
  • 01:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1077.eqiad.wmnet with OS bullseye
  • 01:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1076.eqiad.wmnet with OS bullseye
  • 00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
  • 00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
  • 00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
  • 00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
  • 00:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
  • 00:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
  • 00:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
  • 00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
  • 00:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
  • 00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
  • 00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
  • 00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
  • 00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
  • 00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
  • 00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
  • 00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
  • 00:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
  • 00:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
  • 00:13 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:13 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:13 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED

2023-12-01

  • 22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:14 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
  • 22:13 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
  • 22:11 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 22:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:31 cstone: payments-wiki upgraded from b37ab50e to 5284fc99
  • 19:35 inflatador: bking@wdqs1006 rebooting unresponsive host
  • 18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ceph2001.codfw.wmnet with OS bullseye
  • 17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ceph2001.codfw.wmnet with OS bullseye
  • 16:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ceph2001.codfw.wmnet with OS bullseye
  • 16:39 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1005.eqiad.wmnet with OS bookworm
  • 16:26 dancy@deploy2002: Installation of scap version "4.65.0" completed for 537 hosts
  • 16:26 dancy@deploy2002: Installing scap version "4.65.0" for 537 hosts
  • 16:25 dancy@deploy2002: install-world aborted: (duration: 00m 50s)
  • 16:24 dancy@deploy2002: Installing scap version "4.65.0" for 569 hosts
  • 16:24 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt1046.eqiad.wmnet
  • 16:10 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1046.eqiad.wmnet
  • 16:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 16:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 16:01 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:01 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to scandium - akosiaris@cumin1001"
  • 16:00 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to scandium - akosiaris@cumin1001"
  • 15:58 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 15:58 akosiaris: give AAAA and PTR records to scandium T271142
  • 15:57 akosiaris: give AAAA and PTR records to all rdb hosts (only 50% had it previously)
  • 15:56 dancy@deploy2002: Installing scap version "4.65.0" for 570 hosts
  • 15:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA records to the rest of the 50% of rdb hosts - akosiaris@cumin1001"
  • 15:54 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA records to the rest of the 50% of rdb hosts - akosiaris@cumin1001"
  • 15:52 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 15:51 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[1009-1010].eqiad.wmnet
  • 15:51 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:51 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rdb[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1001"
  • 15:50 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rdb[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1001"
  • 15:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bookworm
  • 15:45 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 15:42 urbanecm: mwmaint2002: mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=frwiki # T352550
  • 15:38 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:38 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:36 akosiaris@deploy2002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 07m 24s)
  • 15:31 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:31 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:28 moritzm: added Kamila to pwstore
  • 15:21 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[1009-1010].eqiad.wmnet
  • 15:19 topranks: moving esams CR interconnect to 4x10G breakout cable T347403
  • 14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 14:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 14:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 14:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 14:26 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:26 akosiaris: cleanup rdb1009 from all deployment charts
  • 14:26 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:26 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:26 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:25 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 14:25 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 14:20 hashar@deploy2002: Finished deploy [integration/docroot@88f69cc]: doc: link to the Gearman Java library (duration: 00m 05s)
  • 14:20 hashar@deploy2002: Started deploy [integration/docroot@88f69cc]: doc: link to the Gearman Java library
  • 14:18 hashar@deploy2002: Finished deploy [integration/docroot@1c2de6b]: doc: link to Disovery parent pom (duration: 00m 06s)
  • 14:18 hashar@deploy2002: Started deploy [integration/docroot@1c2de6b]: doc: link to Disovery parent pom
  • 14:09 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:08 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:05 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:05 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:03 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:48 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 13:48 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 13:32 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 13:31 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 13:30 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 13:30 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 13:28 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:28 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:27 taavi: run prometheus provision-fs on prometheus2* to create file system for cloud instance T350010
  • 13:13 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 13:13 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 12:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 12:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flerovium.eqiad.wmnet
  • 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flerovium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flerovium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:34 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 12:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts flerovium.eqiad.wmnet
  • 12:17 XioNoX: add BGP custom field to Netbox - T306649
  • 12:07 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
  • 12:03 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
  • 12:03 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2211 hosts
  • 12:02 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2211 hosts
  • 11:49 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 11:30 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cr[1-2]-codfw,cr[1-2]-codfw IPv6 with reason: resetting line card
  • 11:30 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on cr[1-2]-codfw,cr[1-2]-codfw IPv6 with reason: resetting line card
  • 11:29 topranks: Reset card 1/0 in cr1-codfw T350159
  • 11:22 topranks: Disabling BGP peering to AS1299 prior to reset of card 1/0 in cr1-codfw T350159
  • 11:09 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2 hosts
  • 11:09 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2 hosts
  • 11:04 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2 hosts
  • 11:04 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2 hosts
  • 11:00 topranks: Draining cr1-codfw transport to cr3-eqsin to reset card 1/0 T350159
  • 10:59 topranks: Resetting circuit preference for transports landing on card 1/1 cr1-codfw T350159
  • 10:55 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 10:49 moritzm: installing wireshark security updates on bookworm
  • 10:37 topranks: Moving VRRP acrtive gateway for codfw row A/B vlans from cr1-codfw to cr2-codfw to reconfigure card 1/1 T350159
  • 10:35 topranks: draining codfw<->eqiad transport link to reconfigure card 1/1 in cr1-codfw T350159
  • 10:34 topranks: draining codfw<->eqdfw transport link to reconfigure card 1/1 in cr1-codfw T350159
  • 10:30 akosiaris@deploy2002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 07m 12s)
  • 10:08 godog: add 60GB to prometheus/k8s in codfw
  • 09:51 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2 hosts
  • 09:51 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2 hosts
  • 09:45 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2211 hosts
  • 09:44 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2211 hosts
  • 09:20 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 09:05 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:59 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:57 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:50 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1026.eqiad.wmnet with OS bookworm
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
  • 07:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
  • 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS bookworm
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2135.codfw.wmnet with OS bookworm
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2135.codfw.wmnet with reason: host reimage
  • 06:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2135.codfw.wmnet with reason: host reimage
  • 05:56 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2135.codfw.wmnet with OS bookworm
  • 05:37 marostegui: Failover m3 from db1119 to db1159 - T352360
  • 05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1119,1159,1217].eqiad.wmnet with reason: m3 master switchover T352149
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1119,1159,1217].eqiad.wmnet with reason: m3 master switchover T352149
  • 02:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2109.codfw.wmnet with OS bookworm
  • 02:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:28 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2107.codfw.wmnet with OS bookworm
  • 02:27 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2108.codfw.wmnet with OS bookworm
  • 02:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:24 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2106.codfw.wmnet with OS bookworm
  • 02:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:16 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2105.codfw.wmnet with OS bookworm
  • 02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2109.codfw.wmnet with reason: host reimage
  • 02:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2109.codfw.wmnet with reason: host reimage
  • 02:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2108.codfw.wmnet with reason: host reimage
  • 02:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2107.codfw.wmnet with reason: host reimage
  • 02:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2108.codfw.wmnet with reason: host reimage
  • 01:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2107.codfw.wmnet with reason: host reimage
  • 01:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2106.codfw.wmnet with reason: host reimage
  • 01:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2003']
  • 01:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2001']
  • 01:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2106.codfw.wmnet with reason: host reimage
  • 01:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2105.codfw.wmnet with reason: host reimage
  • 01:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2105.codfw.wmnet with reason: host reimage
  • 01:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2109.codfw.wmnet with OS bookworm
  • 01:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2108.codfw.wmnet with OS bookworm
  • 01:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2002']
  • 01:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2104.codfw.wmnet with OS bookworm
  • 01:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
  • 01:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2107.codfw.wmnet with OS bookworm
  • 01:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2002']
  • 01:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2003']
  • 01:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
  • 01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2001']
  • 01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ceph2003']
  • 01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ceph2002']
  • 01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ceph2001']
  • 01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
  • 01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2001']
  • 01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2003']
  • 01:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2106.codfw.wmnet with OS bookworm
  • 01:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2103.codfw.wmnet with OS bookworm
  • 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2105.codfw.wmnet with OS bookworm
  • 01:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2102.codfw.wmnet with OS bookworm
  • 01:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2100.codfw.wmnet with OS bookworm
  • 01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2101.codfw.wmnet with OS bookworm
  • 01:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:28 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2104.codfw.wmnet with reason: host reimage
  • 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ceph2001-3 to codfw - jhancock@cumin2002"
  • 01:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ceph2001-3 to codfw - jhancock@cumin2002"
  • 01:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2104.codfw.wmnet with reason: host reimage
  • 01:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:14 foks: removing 120 files for legal compliance
  • 01:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2103.codfw.wmnet with reason: host reimage
  • 01:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2100.codfw.wmnet with reason: host reimage
  • 01:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2102.codfw.wmnet with reason: host reimage
  • 01:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2100.codfw.wmnet with reason: host reimage
  • 01:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2101.codfw.wmnet with reason: host reimage
  • 01:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2101.codfw.wmnet with reason: host reimage
  • 00:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2104.codfw.wmnet with OS bookworm
  • 00:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2103.codfw.wmnet with OS bookworm
  • 00:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2102.codfw.wmnet with OS bookworm
  • 00:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2101.codfw.wmnet with OS bookworm
  • 00:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2100.codfw.wmnet with OS bookworm
  • 00:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2098.codfw.wmnet with OS bookworm
  • 00:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2099.codfw.wmnet with OS bookworm
  • 00:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2097.codfw.wmnet with OS bookworm
  • 00:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2094.codfw.wmnet with OS bookworm
  • 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:25 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
  • 00:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
  • 00:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2099.codfw.wmnet with reason: host reimage
  • 00:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2099.codfw.wmnet with reason: host reimage
  • 00:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1105.eqiad.wmnet with OS bookworm
  • 00:09 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:08 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2098.codfw.wmnet with reason: host reimage
  • 00:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1107.eqiad.wmnet with OS bookworm
  • 00:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:03 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2098.codfw.wmnet with reason: host reimage
  • 00:01 krinkle@deploy2002: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 06m 37s)
  • 00:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2094.codfw.wmnet with reason: host reimage


Other archives

2000s

2010s

2020s