Server Admin Log/Archive 74

2023-12-30

16:55 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: Add eventlogging_MediaWikiPingback stream (T323828) (duration: 15m 10s)

2023-12-29

22:59 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:59 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:57 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:01 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
08:00 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:00 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
07:58 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:58 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
07:58 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:58 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
07:57 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
07:57 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:12 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:11 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:10 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:10 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:09 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:08 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:07 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:07 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:06 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:06 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:03 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:03 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:03 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:02 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
00:01 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:01 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply

2023-12-28

23:59 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:59 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:58 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:57 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:57 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:52 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:51 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:50 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:48 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:47 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:47 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:46 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:46 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:46 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:45 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:35 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:35 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:20 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:20 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply

2023-12-27

22:53 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:53 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:46 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:46 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:41 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:40 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply

2023-12-23

20:22 _joe_: downgraded vopsbot on alert1001, hopefully should not keep panicing in this unexpected situation
15:40 taavi: fix date-time on mw2448 (which thought it is the year 2098) by manually setting it once and then restarting systemd-timesyncd.service after bios was reset in T353679
01:19 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
01:19 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.

2023-12-22

17:28 krinkle@deploy2002: Synchronized php-1.42.0-wmf.10/includes/skins/Skin.php: Ice6d6c (duration: 06m 25s)
15:16 jgiannelos@deploy2002: Finished deploy [restbase/deploy@5f2756a]: (no justification provided) (duration: 17m 36s)
14:58 jgiannelos@deploy2002: Started deploy [restbase/deploy@5f2756a]: (no justification provided)
14:57 jgiannelos@deploy2002: Finished deploy [restbase/deploy@f0c9f9f]: (no justification provided) (duration: 09m 32s)
14:48 jgiannelos@deploy2002: Started deploy [restbase/deploy@f0c9f9f]: (no justification provided)
14:01 jgiannelos@deploy2002: Finished deploy [restbase/deploy@4f56fff]: (no justification provided) (duration: 16m 57s)
13:45 reedy@deploy2002: Finished scap: T353920 (duration: 08m 02s)
13:44 jgiannelos@deploy2002: Started deploy [restbase/deploy@4f56fff]: (no justification provided)
13:37 reedy@deploy2002: Started scap: T353920
11:31 vgutierrez: upload golang-github-intel-go-cpuid_0.0~git20210602.5747e5c-2+deb12u1 to apt.wm.o (bookworm)
10:42 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:42 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:39 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
09:57 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .

2023-12-21

21:42 wfan: payment-wiki revision 1c96980a -> 3b281d10
19:31 reedy@deploy2002: Synchronized wmf-config/CommonSettings.php: T346919 (duration: 06m 26s)
19:14 dancy@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.10 refs T350086
18:39 mutante: releases1003 - sudo chmod -R g+w /srv/org/wikimedia/releases/mediawiki/1.*
17:26 mutante: mirror1001 - when syncing tails mirror - @ERROR: max connections (23) reached -- try again later
17:23 mutante: [mirror1001:~] $ sudo systemctl start update-tails-mirror
17:04 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
17:03 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
17:03 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
17:03 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
17:02 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
17:02 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
16:27 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
16:27 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
16:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
16:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
16:18 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
16:17 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
16:10 volans@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1008.eqiad.wmnet
16:10 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:08 volans@cumin1002: START - Cookbook sre.dns.netbox
16:03 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1008.eqiad.wmnet
15:59 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:58 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:54 volans@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1007.eqiad.wmnet
15:54 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:53 volans@cumin1002: START - Cookbook sre.dns.netbox
15:47 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1007.eqiad.wmnet
15:44 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:44 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:38 kharlan@deploy2002: Finished scap: Backport for Use username for lookup for non-existing user as the vague target (duration: 10m 37s)
15:36 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:35 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:32 kharlan@deploy2002: kharlan and dreamyjazz: Continuing with sync
15:30 kharlan@deploy2002: kharlan and dreamyjazz: Backport for Use username for lookup for non-existing user as the vague target synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:28 kharlan@deploy2002: Started scap: Backport for Use username for lookup for non-existing user as the vague target
15:24 kharlan@deploy2002: Finished scap: Backport for Use username for lookup for non-existing user as the vague target (duration: 11m 38s)
15:20 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:19 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:18 kharlan@deploy2002: kharlan and dreamyjazz: Continuing with sync
15:15 kharlan@deploy2002: kharlan and dreamyjazz: Backport for Use username for lookup for non-existing user as the vague target synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:13 kharlan@deploy2002: Started scap: Backport for Use username for lookup for non-existing user as the vague target
15:11 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:10 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:52 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Fix showing units and limits in NewPP limit report (T353793) (duration: 09m 27s)
14:46 lucaswerkmeister-wmde@deploy2002: matmarex and lucaswerkmeister-wmde: Continuing with sync
14:44 lucaswerkmeister-wmde@deploy2002: matmarex and lucaswerkmeister-wmde: Backport for Fix showing units and limits in NewPP limit report (T353793) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:43 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Fix showing units and limits in NewPP limit report (T353793)
14:37 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:36 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:31 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:29 jclark@cumin1002: START - Cookbook sre.dns.netbox
14:27 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Ignore "exact match" title when the title is not given (T353860) (duration: 08m 33s)
14:21 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Continuing with sync
14:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for Ignore "exact match" title when the title is not given (T353860) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:18 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Ignore "exact match" title when the title is not given (T353860)
14:17 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes bdwikimedia --fix # T351903 – 62 pages to fix, 62 were resolvable. 56 links to fix, 54 were resolvable, 2 were deleted.
14:15 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for uzwikipedia: add a temporary logo for the 20th anniversary (T353723) (duration: 09m 28s)
14:13 moritzm: re-added Eoghan to pwstore
14:09 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Continuing with sync
14:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on 10 hosts with reason: T352878
14:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on 10 hosts with reason: T352878
14:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on 13 hosts with reason: T352878
14:08 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Backport for uzwikipedia: add a temporary logo for the 20th anniversary (T353723) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:07 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on 13 hosts with reason: T352878
14:06 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for uzwikipedia: add a temporary logo for the 20th anniversary (T353723)
13:50 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:23 moritzm: installing libde265 security updates
12:29 volans@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs1006.eqiad.wmnet
12:29 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:27 volans@cumin1002: START - Cookbook sre.dns.netbox
12:20 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet
12:18 volans@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1002.eqiad.wmnet with reason: Release v0.6.5 - volans@cumin1002
12:14 volans@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin1002.eqiad.wmnet with reason: Release v0.6.5 - volans@cumin1002
11:37 claime: Manually restarted cassandra-a service on restbase2028 following OOM - T353456
11:23 volans@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wdqs1006.eqiad.wmnet
11:22 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet
11:16 volans@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wdqs1006.eqiad.wmnet
11:13 volans@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs1006.eqiad.wmnet
10:42 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
10:42 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
10:29 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
09:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1006
09:40 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1006
08:59 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
08:54 apergos: UTC morning backport and config window done
08:50 ariel@deploy2002: Finished scap: Backport for CommentFormatter: Do not add wrapper if the heading has attributes (T353489) (duration: 12m 39s)
08:44 ariel@deploy2002: ariel and matmarex: Continuing with sync
08:39 ariel@deploy2002: ariel and matmarex: Backport for CommentFormatter: Do not add wrapper if the heading has attributes (T353489) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:37 ariel@deploy2002: Started scap: Backport for CommentFormatter: Do not add wrapper if the heading has attributes (T353489)
08:25 ariel@deploy2002: Finished scap: Backport for CommentFormatter: Do not add wrapper if the heading has attributes (T353489) (duration: 11m 07s)
08:19 ariel@deploy2002: matmarex and ariel: Continuing with sync
08:16 ariel@deploy2002: matmarex and ariel: Backport for CommentFormatter: Do not add wrapper if the heading has attributes (T353489) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:14 ariel@deploy2002: Started scap: Backport for CommentFormatter: Do not add wrapper if the heading has attributes (T353489)
05:56 kart_: Updated MinT to 2023-12-20-071058-production
05:50 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
05:42 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
05:40 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
05:35 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
05:29 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
05:26 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2075.codfw.wmnet with OS bullseye
01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage
00:24 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage

2023-12-20

23:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
23:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2075.codfw.wmnet with OS bullseye
23:24 ryankemper@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host netbox1002
23:24 ryankemper@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host netbox1002
23:19 ryankemper@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wdqs1006
23:19 ryankemper@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1006
23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
22:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs[1020-1021].eqiad.wmnet
22:59 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for wdqs[1020-1021].eqiad.wmnet
22:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18 days, 0:00:00 on wdqs[1020-1024].eqiad.wmnet with reason: T352878
22:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18 days, 0:00:00 on wdqs[1020-1024].eqiad.wmnet with reason: T352878
22:25 ryankemper@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wdqs[1006-1008].eqiad.wmnet
22:25 ryankemper@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:25 ryankemper@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1006-1008].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1002"
22:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2080.codfw.wmnet with OS bullseye
22:24 ryankemper@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1006-1008].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1002"
22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2079.codfw.wmnet with OS bullseye
22:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
22:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2075']
22:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2075']
22:20 ryankemper@cumin1002: START - Cookbook sre.dns.netbox
22:18 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
22:18 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2077.codfw.wmnet with OS bullseye
22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:17 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
22:17 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2078.codfw.wmnet with OS bullseye
22:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2076.codfw.wmnet with OS bullseye
22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:16 ryankemper@cumin1002: START - Cookbook sre.hosts.decommission for hosts wdqs[1006-1008].eqiad.wmnet
22:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:13 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
22:12 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
22:10 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
22:09 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
22:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:08 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
22:08 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
22:06 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
22:05 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
22:03 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2033.codfw.wmnet with OS bullseye
22:03 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
22:02 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:59 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:59 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2080.codfw.wmnet with reason: host reimage
21:59 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:59 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2079.codfw.wmnet with reason: host reimage
21:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:56 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2077.codfw.wmnet with reason: host reimage
21:54 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:54 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:53 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:53 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2076.codfw.wmnet with reason: host reimage
21:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2080.codfw.wmnet with reason: host reimage
21:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2079.codfw.wmnet with reason: host reimage
21:48 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
21:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
21:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2077.codfw.wmnet with reason: host reimage
21:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2078.codfw.wmnet with reason: host reimage
21:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2076.codfw.wmnet with reason: host reimage
21:48 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:47 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:47 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:46 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:45 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
21:45 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lsw1-a8-codfw,lsw1-a8-codfw IPv6 with reason: testing commit confirm check in cookbook
21:45 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lsw1-a8-codfw,lsw1-a8-codfw IPv6 with reason: testing commit confirm check in cookbook
21:41 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:40 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:39 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:39 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:38 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
21:37 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
21:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2080.codfw.wmnet with OS bullseye
21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2079.codfw.wmnet with OS bullseye
21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2078.codfw.wmnet with OS bullseye
21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2077.codfw.wmnet with OS bullseye
21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2076.codfw.wmnet with OS bullseye
21:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
21:30 dancy@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.10 refs T350086 (duration: 05m 57s)
21:28 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
21:26 eevans@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2033.codfw.wmnet with OS bullseye
21:24 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.10 refs T350086
21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2074.codfw.wmnet with OS bullseye
21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:15 ladsgroup@deploy2002: Finished scap: Backport for Protect against ParserOutput re-namespacing (T353835) (duration: 08m 13s)
21:09 ladsgroup@deploy2002: ladsgroup: Continuing with sync
21:08 ladsgroup@deploy2002: ladsgroup: Backport for Protect against ParserOutput re-namespacing (T353835) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:08 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
21:07 ladsgroup@deploy2002: Started scap: Backport for Protect against ParserOutput re-namespacing (T353835)
21:04 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
21:02 aqu@deploy2002: Finished deploy [airflow-dags/research@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 28s)
21:01 aqu@deploy2002: Started deploy [airflow-dags/research@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2074.codfw.wmnet with reason: host reimage
20:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2074.codfw.wmnet with reason: host reimage
20:49 ladsgroup@deploy2002: Finished scap: Backport for Protect against ParserOutput re-namespacing (T353835) (duration: 08m 19s)
20:47 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
20:47 eevans@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2033.codfw.wmnet with OS bullseye
20:43 ladsgroup@deploy2002: ladsgroup: Continuing with sync
20:42 ladsgroup@deploy2002: ladsgroup: Backport for Protect against ParserOutput re-namespacing (T353835) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:40 ladsgroup@deploy2002: Started scap: Backport for Protect against ParserOutput re-namespacing (T353835)
20:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2074.codfw.wmnet with OS bullseye
20:31 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
20:30 eevans@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2033.codfw.wmnet with OS bullseye
19:51 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
19:48 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2033.codfw.wmnet with reason: host reimage
19:30 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host restbase2033.codfw.wmnet with OS bullseye
19:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host wdqs1022.eqiad.wmnet
19:27 dancy@deploy2002: Finished php-fpm-restarts
19:24 dancy@deploy2002: Starting php-fpm-restarts
19:18 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.10 refs T350086
18:59 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
18:59 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
18:59 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
18:58 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
18:58 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
18:57 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
18:38 krinkle@deploy2002: Finished deploy [integration/docroot@355ddbb]: (no justification provided) (duration: 00m 07s)
18:38 krinkle@deploy2002: Started deploy [integration/docroot@355ddbb]: (no justification provided)
18:06 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
18:06 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
18:05 cmooney@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
18:05 cmooney@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
18:05 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2003
18:05 cmooney@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
17:26 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1022.eqiad.wmnet
17:25 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1022.eqiad.wmnet
17:25 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1022.eqiad.wmnet
17:05 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
16:03 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
16:03 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
16:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2080.codfw.wmnet with OS bullseye
15:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2074.codfw.wmnet with OS bullseye
15:18 Lucas_WMDE: UTC afternoon backport+config window done
15:17 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Replace $wgCommandLineMode checks with MW_ENTRY_POINT (T353751) (duration: 08m 22s)
15:15 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts wdqs1022.eqiad.wmnet
15:11 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Continuing with sync
15:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for Replace $wgCommandLineMode checks with MW_ENTRY_POINT (T353751) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:09 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts wdqs1024.eqiad.wmnet
15:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1024.eqiad.wmnet
15:08 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Replace $wgCommandLineMode checks with MW_ENTRY_POINT (T353751)
15:06 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1022.eqiad.wmnet
15:05 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts wdqs1022.eqiad.wmnet
15:05 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1022.eqiad.wmnet
15:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1023.eqiad.wmnet
15:05 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1023.eqiad.wmnet
15:05 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
15:04 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
15:02 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1024.eqiad.wmnet
15:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1024.eqiad.wmnet
15:01 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1024.eqiad.wmnet
15:01 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1024.eqiad.wmnet
14:58 inflatador: bking@cumin2002 disable/mask wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-categories on wdqs102[24] T352878
14:57 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for RunSingleJob.php: Fix use of MWExceptionHandler before it's defined (T352265) (duration: 10m 30s)
14:51 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Continuing with sync
14:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and matmarex: Backport for RunSingleJob.php: Fix use of MWExceptionHandler before it's defined (T352265) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:46 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for RunSingleJob.php: Fix use of MWExceptionHandler before it's defined (T352265)
14:43 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove BetaFeature code related to ReferencePreviews (T351708), Remove wgPopupsReferencePreviews now that it defaults to true (T351708) (duration: 10m 16s)
14:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and awight and wmde-fisch: Continuing with sync
14:35 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and awight and wmde-fisch: Backport for Remove BetaFeature code related to ReferencePreviews (T351708), Remove wgPopupsReferencePreviews now that it defaults to true (T351708) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:33 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove BetaFeature code related to ReferencePreviews (T351708), Remove wgPopupsReferencePreviews now that it defaults to true (T351708)
14:30 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Check for false from ThumbnailImage::getStoragePath (T353758) (duration: 09m 38s)
14:26 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Continuing with sync
14:22 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Backport for Check for false from ThumbnailImage::getStoragePath (T353758) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:21 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Check for false from ThumbnailImage::getStoragePath (T353758)
14:19 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Make wiktionary and mw.org provide og:site_name (T348203) (duration: 15m 54s)
14:16 moritzm: installing distro-info-data updates from Bookworm point release
14:14 lucaswerkmeister-wmde@deploy2002: pols12 and lucaswerkmeister-wmde: Continuing with sync
14:12 moritzm: installing debootstrap bugfix updates from Bookworm point release
14:06 lucaswerkmeister-wmde@deploy2002: pols12 and lucaswerkmeister-wmde: Backport for Make wiktionary and mw.org provide og:site_name (T348203) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:04 moritzm: installing cups updates from bookworm point release
14:04 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Make wiktionary and mw.org provide og:site_name (T348203)
13:38 aqu@deploy2002: Finished deploy [airflow-dags/wmde@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac513] (duration: 00m 05s)
13:38 aqu@deploy2002: Started deploy [airflow-dags/wmde@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac513]
13:38 aqu@deploy2002: Finished deploy [airflow-dags/search@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 30s)
13:37 aqu@deploy2002: Started deploy [airflow-dags/search@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
13:37 aqu@deploy2002: Finished deploy [airflow-dags/research@90f280e]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@e2ed6162] (duration: 00m 06s)
13:37 aqu@deploy2002: Started deploy [airflow-dags/research@90f280e]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@e2ed6162]
13:36 aqu@deploy2002: Finished deploy [airflow-dags/platform_eng@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 25s)
13:36 aqu@deploy2002: Started deploy [airflow-dags/platform_eng@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
13:35 aqu@deploy2002: Finished deploy [airflow-dags/analytics_product@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 09s)
13:35 aqu@deploy2002: Started deploy [airflow-dags/analytics_product@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
13:34 aqu@deploy2002: Finished deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 05s)
13:34 aqu@deploy2002: Started deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
13:34 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 11s)
13:34 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
13:32 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 01s)
13:32 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
13:31 aqu@deploy2002: Finished deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131] (duration: 00m 01s)
13:31 aqu@deploy2002: Started deploy [airflow-dags/analytics@d5ac513]: Make sure airflow-dags is up-to-date before activating metrics [airflow-dags@d5ac5131]
12:12 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
11:30 kostajh: T353703 Manual run: /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/mediamoderation.dblist extensions/MediaModeration/maintenance/updateMetrics.php --verbose
10:22 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
10:22 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
09:43 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
09:39 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh5002.wikimedia.org
09:39 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for doh5002.wikimedia.org
09:10 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh2001.wikimedia.org
09:10 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for doh2001.wikimedia.org
08:47 fabfur@cumin1001: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
06:31 ryankemper: T351671 Pooled `wdqs10[17-21]*`; data xfers completed and test queries are passing on `wdqs1018`. Will decom related hosts tomorrow (2023-12-20)
02:47 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
02:45 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
02:44 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
02:43 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
02:43 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
02:41 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
02:39 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
02:37 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
02:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
02:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
00:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
00:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
00:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
00:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
00:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
00:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 22:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
00:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 22:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671

2023-12-19

22:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
22:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
22:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
22:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
22:26 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on wdqs[1017-1021].eqiad.wmnet with reason: bringing new wdqs hosts online T351671
21:43 mforns@deploy2002: Finished deploy [airflow-dags/wmde@d5ac513]: (no justification provided) (duration: 00m 11s)
21:43 mforns@deploy2002: Started deploy [airflow-dags/wmde@d5ac513]: (no justification provided)
21:43 mforns@deploy2002: Finished deploy [airflow-dags/analytics@d5ac513]: (no justification provided) (duration: 00m 27s)
21:43 mforns@deploy2002: Started deploy [airflow-dags/analytics@d5ac513]: (no justification provided)
21:39 ladsgroup@deploy2002: Finished scap: Backport for Disable listings extension in more wikis (T253216) (duration: 07m 42s)
21:33 ladsgroup@deploy2002: ladsgroup: Continuing with sync
21:32 ladsgroup@deploy2002: ladsgroup: Backport for Disable listings extension in more wikis (T253216) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:31 ladsgroup@deploy2002: Started scap: Backport for Disable listings extension in more wikis (T253216)
21:26 kostajh: UTC late deploys done
21:26 kharlan@deploy2002: Finished scap: Backport for Undeploy Annual Plan Core Metrics survey (T351353) (duration: 10m 00s)
21:20 kharlan@deploy2002: kharlan and dani: Continuing with sync
21:17 kharlan@deploy2002: kharlan and dani: Backport for Undeploy Annual Plan Core Metrics survey (T351353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:16 kharlan@deploy2002: Started scap: Backport for Undeploy Annual Plan Core Metrics survey (T351353)
21:14 kharlan@deploy2002: Finished scap: Backport for MediaModeration: Add dblist (T353703) (duration: 07m 44s)
21:08 kharlan@deploy2002: kharlan: Continuing with sync
21:08 kharlan@deploy2002: kharlan: Backport for MediaModeration: Add dblist (T353703) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:06 kharlan@deploy2002: Started scap: Backport for MediaModeration: Add dblist (T353703)
19:10 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.10 refs T350086
18:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testhost2001.codfw.wmnet with OS bullseye
18:56 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:49 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe] (duration: 00m 05s)
18:48 mforns@deploy2002: Started deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe]
18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:43 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe] (duration: 03m 16s)
18:39 mforns@deploy2002: Started deploy [analytics/refinery@28dccef] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28dccefe]
18:39 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef] (thin): Regular analytics weekly train THIN [analytics/refinery@28dccefe] (duration: 00m 06s)
18:39 mforns@deploy2002: Started deploy [analytics/refinery@28dccef] (thin): Regular analytics weekly train THIN [analytics/refinery@28dccefe]
18:39 mforns@deploy2002: Finished deploy [analytics/refinery@28dccef]: Regular analytics weekly train [analytics/refinery@28dccefe] (duration: 09m 18s)
18:29 mforns@deploy2002: Started deploy [analytics/refinery@28dccef]: Regular analytics weekly train [analytics/refinery@28dccefe]
18:29 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@d275e4f]: Deploy latest DAG changes to Analytics Airflow instance (duration: 00m 31s)
18:28 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@d275e4f]: Deploy latest DAG changes to Analytics Airflow instance
18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
18:22 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testhost2001.codfw.wmnet with reason: host reimage
18:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
18:06 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host testhost2001.codfw.wmnet with OS bullseye
17:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
16:23 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
16:15 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
16:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on moss-be[2001-2003].codfw.wmnet with reason: not in service, being used to test a destructive cookbook
16:12 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on moss-be[2001-2003].codfw.wmnet with reason: not in service, being used to test a destructive cookbook
16:04 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 327700
16:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 327700
16:02 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 139901
16:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 139901
16:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
15:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5398
15:55 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
15:55 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2448.codfw.wmnet with reason: hw failure
15:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5398
15:42 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Change virtual domain of botpassword to plural (T351559) (duration: 07m 01s)
15:38 moritzm: installing gnutls28 security updates on bookworm
15:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and ladsgroup: Continuing with sync
15:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and ladsgroup: Backport for Change virtual domain of botpassword to plural (T351559) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:35 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Change virtual domain of botpassword to plural (T351559)
15:33 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Use main replica DB in importExistingFilesToScanTable.php (duration: 07m 47s)
15:27 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Continuing with sync
15:27 lucaswerkmeister-wmde@deploy2002: kharlan and lucaswerkmeister-wmde: Backport for Use main replica DB in importExistingFilesToScanTable.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:25 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Use main replica DB in importExistingFilesToScanTable.php
15:23 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: host is down, downtiming in icinga too
15:23 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: host is down, downtiming in icinga too
15:22 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Make SearchEntitiesIntegrationTest an ApiTestCase (T353334), Use link batch in search APIs (T353334) (duration: 08m 49s)
15:16 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
15:15 moritzm: installing exim4 bugfix updates from Bookworm point release
15:15 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Make SearchEntitiesIntegrationTest an ApiTestCase (T353334), Use link batch in search APIs (T353334) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:13 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Make SearchEntitiesIntegrationTest an ApiTestCase (T353334), Use link batch in search APIs (T353334)
15:10 moritzm: installing nagios-plugins-contrib bugfix updates from Bookworm point release
14:44 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:43 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:42 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:33 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
14:32 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
14:31 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
14:30 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
14:29 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
14:29 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
14:25 Lucas_WMDE: UTC afternoon backport+config window done
14:25 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Send PhotoDNA the mime type of the thumbnail and not original file (T351401), Add maintenance script to scan files in the mediamoderation_scan table (T351399) (duration: 07m 53s)
14:24 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
14:24 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
14:24 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:24 kamila@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:22 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
14:21 kamila@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
14:21 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:19 kamila@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kharlan: Continuing with sync
14:18 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kharlan: Backport for Send PhotoDNA the mime type of the thumbnail and not original file (T351401), Add maintenance script to scan files in the mediamoderation_scan table (T351399) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:17 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Send PhotoDNA the mime type of the thumbnail and not original file (T351401), Add maintenance script to scan files in the mediamoderation_scan table (T351399)
14:15 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for testwiki: enable revertrisk model in ores extension (T348298) (duration: 10m 22s)
14:10 lucaswerkmeister-wmde@deploy2002: isaranto and lucaswerkmeister-wmde: Continuing with sync
14:08 lucaswerkmeister-wmde@deploy2002: isaranto and lucaswerkmeister-wmde: Backport for testwiki: enable revertrisk model in ores extension (T348298) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:05 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for testwiki: enable revertrisk model in ores extension (T348298)
13:45 jgiannelos@deploy2002: Finished deploy [restbase/deploy@40c15b1]: (no justification provided) (duration: 27m 26s)
13:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
13:35 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1001.eqiad.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
13:32 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin1001.eqiad.wmnet with reason: Release v0.6.5 - ayounsi@cumin1001
13:17 jgiannelos@deploy2002: Started deploy [restbase/deploy@40c15b1]: (no justification provided)
13:12 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
13:08 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:05 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:05 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
12:24 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
12:24 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
12:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
12:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
11:31 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
10:46 moritzm: installing perl security updates on bookworm
10:19 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:14 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
10:14 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
09:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
09:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
09:23 elukey: reload thanos-rule on titan2001
08:27 jmm@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lists1003.wikimedia.org
08:27 jmm@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:27 jmm@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1002"
08:26 jmm@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1002"
08:22 jmm@cumin1002: START - Cookbook sre.dns.netbox
08:17 jmm@cumin1002: START - Cookbook sre.hosts.decommission for hosts lists1003.wikimedia.org
06:13 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
06:07 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
05:10 kart_: Updated MinT to 2023-12-12-065316-production
04:56 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
04:54 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.10 refs T350086 (duration: 51m 03s)
04:49 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
04:49 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
04:43 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
04:40 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
04:36 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
04:09 cstone: civicrm upgraded from e2d49d10 to c3cc80c7
04:03 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.10 refs T350086

2023-12-18

23:40 taavi: conftool codfw/appserver/nginx/mw2448.codfw.wmnet: pooled changed yes => inactive # T353679, not sure why it was not logged automatically
22:35 maryum: Deployed patch for T347704
22:08 dancy: UTC late backport window completed.
22:07 dancy@deploy2002: Finished scap: Backport for Revert "Fix English Gboard backspace over aliens" (T353578 T325129), Revert "Put zero-width space after inline focusable nodes" (T353578 T330284), Update VE core submodule to wmf.9 (6bada65) (T353578 T330284 T325129) (duration: 13m 34s)
21:57 dancy@deploy2002: dancy and kemayo: Continuing with sync
21:56 dancy@deploy2002: dancy and kemayo: Backport for Revert "Fix English Gboard backspace over aliens" (T353578 T325129), Revert "Put zero-width space after inline focusable nodes" (T353578 T330284), Update VE core submodule to wmf.9 (6bada65) (T353578 T330284 T325129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:54 dancy@deploy2002: Started scap: Backport for Revert "Fix English Gboard backspace over aliens" (T353578 T325129), Revert "Put zero-width space after inline focusable nodes" (T353578 T330284), Update VE core submodule to wmf.9 (6bada65) (T353578 T330284 T325129)
21:17 dancy@deploy2002: Finished scap: Backport for Undeploy Reader Demographics 2 survey (T344393) (duration: 08m 30s)
21:11 dancy@deploy2002: dani and dancy: Continuing with sync
21:10 dancy@deploy2002: dani and dancy: Backport for Undeploy Reader Demographics 2 survey (T344393) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:09 dancy@deploy2002: Started scap: Backport for Undeploy Reader Demographics 2 survey (T344393)
21:05 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
21:05 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
21:04 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
21:04 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
21:03 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
21:03 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
21:01 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
21:01 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
20:53 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:53 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
20:52 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:52 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
20:48 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: Add message_key_fields to page_content_change stream (T338231) (duration: 06m 32s)
20:31 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:31 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
20:19 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:19 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:14 inflatador: bking@kafka-jumbo1007 kafka topics --alter --topic codfw.mediawiki.cirrussearch.page_rerender.v1 --partitions 5 T351503
17:12 inflatador: bking@kafka-jumbo1007 kafka topics --alter --topic eqiad.mediawiki.cirrussearch.page_rerender.v1 --partitions 5 T351503
17:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2074.codfw.wmnet with OS bullseye
16:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc-gp[12]00[123] - akosiaris@cumin1001"
16:55 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc-gp[12]00[123] - akosiaris@cumin1001"
16:54 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 28s)
16:52 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
16:52 akosiaris@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
16:52 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
16:48 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 08s)
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2076']
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2075']
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2074']
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2079']
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2077']
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2080']
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2078']
16:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2080']
16:35 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2080
16:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc2042-mc2055 - akosiaris@cumin1001"
16:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2080
16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1077
16:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1077
16:33 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to mc2042-mc2055 - akosiaris@cumin1001"
16:31 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
16:28 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
16:28 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
16:25 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
16:25 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be2080']
16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2080']
16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
16:22 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2077']
16:22 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2076']
16:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2075']
16:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2074']
16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2079']
16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2080']
16:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be2079']
16:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2080']
16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2078']
16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2077']
16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2076']
16:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2076']
16:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2075']
16:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2074']
16:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
16:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2079']
16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2078']
16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2077']
16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2075']
16:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2074']
16:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2080.mgmt.codfw.wmnet with reboot policy FORCED
16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2079.mgmt.codfw.wmnet with reboot policy FORCED
16:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2076.mgmt.codfw.wmnet with reboot policy FORCED
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2078.mgmt.codfw.wmnet with reboot policy FORCED
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2077.mgmt.codfw.wmnet with reboot policy FORCED
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2074.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2080.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2079.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2078.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2077.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2076.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2075.mgmt.codfw.wmnet with reboot policy FORCED
15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2074.mgmt.codfw.wmnet with reboot policy FORCED
15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2074-2080 to codfw - jhancock@cumin2002"
15:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2074-2080 to codfw - jhancock@cumin2002"
15:37 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
15:36 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
15:16 fabfur: repooling cp4037 (T352876)
15:16 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp4037.ulsfo.wmnet
15:16 fabfur@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp4037.ulsfo.wmnet
15:04 urbanecm@deploy2002: Finished scap: Backport for Configure and enable StatsLib for production (T343024), Revert "util.main: Don't use mw.Map(), use a native Map() instead" (T353571 T353076) (duration: 10m 20s)
14:58 urbanecm@deploy2002: cwhite and urbanecm and chlod: Continuing with sync
14:55 urbanecm@deploy2002: cwhite and urbanecm and chlod: Backport for Configure and enable StatsLib for production (T343024), Revert "util.main: Don't use mw.Map(), use a native Map() instead" (T353571 T353076) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:53 urbanecm@deploy2002: Started scap: Backport for Configure and enable StatsLib for production (T343024), Revert "util.main: Don't use mw.Map(), use a native Map() instead" (T353571 T353076)
14:52 urbanecm@deploy2002: Finished scap: Backport for Enable action blocks for zhwiki (T353120) (duration: 08m 58s)
14:47 urbanecm@deploy2002: milkydefer and urbanecm: Continuing with sync
14:45 moritzm: installing nagios-plugins-contrib bugfix updates from Bookworm point release
14:45 moritzm: installing nagios-plugins-contrib bugfix updates
14:44 urbanecm@deploy2002: milkydefer and urbanecm: Backport for Enable action blocks for zhwiki (T353120) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:44 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@d275e4f]: (no justification provided) (duration: 00m 32s)
14:44 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@d275e4f]: (no justification provided)
14:43 urbanecm@deploy2002: Started scap: Backport for Enable action blocks for zhwiki (T353120)
14:43 urbanecm@deploy2002: Finished scap: Backport for Add a testing stream for page-prediction-change events (T349919), CheckUser: Enable read new for event tables migration everywhere (T341829) (duration: 19m 00s)
14:37 urbanecm@deploy2002: dreamyjazz and aikochou and urbanecm: Continuing with sync
14:36 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
14:35 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
14:34 urbanecm@deploy2002: dreamyjazz and aikochou and urbanecm: Backport for Add a testing stream for page-prediction-change events (T349919), CheckUser: Enable read new for event tables migration everywhere (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:24 urbanecm@deploy2002: Started scap: Backport for Add a testing stream for page-prediction-change events (T349919), CheckUser: Enable read new for event tables migration everywhere (T341829)
14:13 moritzm: installing node-undici security updates
13:15 moritzm: installing intel-microcode security updates on buster hosts
13:08 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
12:56 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
12:55 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
12:52 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
12:51 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
12:50 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
12:50 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:45 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
12:41 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
12:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-canary
12:26 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-canary
12:26 kamila@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:25 kamila@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:24 kamila@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
12:23 kamila@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
12:20 kamila@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:20 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
12:20 kamila@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:19 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
12:19 kamila@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:18 kamila@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:14 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:13 kamila@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:12 Emperor: restart swift-proxy and envoyproxy on ms-fe1012
12:10 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:09 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:04 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:03 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:01 moritzm: installing ncurses security updates
11:59 kamila@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:58 kamila@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:51 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
11:51 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
11:41 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
11:41 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
11:39 moritzm: installing qemu security updates on bookworm
11:38 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
11:37 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
11:36 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
11:36 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
10:56 moritzm: restarting apache/FPM on mw canaries to pick up gnutls update
10:52 moritzm: installing gnutls28 security updates
10:47 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
10:44 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
10:39 moritzm: installing jetty9 security updates
10:29 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
10:29 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
10:17 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
10:13 XioNoX: remove VRRP pinning on cr1-eqiad/cr2-eqiad/cr2-codfw
10:09 moritzm: installing Linux 6.1.67 updates on Bookworm hosts
09:45 XioNoX: make eqiad-codfw 100G link primary
09:10 vgutierrez: vgutierrez@acmechief1002:~$ sudo -i keyholder arm - T352242

2023-12-17

12:59 elukey: restart kubelet on ml-serve1001 (errors while syncing old containers)

2023-12-16

01:21 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@fb10de1]: (no justification provided) (duration: 00m 10s)
01:21 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@fb10de1]: (no justification provided)
00:44 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@63804c4]: (no justification provided) (duration: 00m 25s)
00:44 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@63804c4]: (no justification provided)
00:05 jhathaway: unbreaking my puppet change with, https://gerrit.wikimedia.org/r/c/operations/puppet/+/983504

2023-12-15

23:46 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@9600237]: (no justification provided) (duration: 00m 27s)
23:46 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@9600237]: (no justification provided)
23:06 milimetric@deploy2002: Finished deploy [airflow-dags/platform_eng@160d0f0]: (no justification provided) (duration: 00m 25s)
23:06 milimetric@deploy2002: Started deploy [airflow-dags/platform_eng@160d0f0]: (no justification provided)
22:42 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:42 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:03 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@5090fdc]: (no justification provided) (duration: 00m 25s)
22:03 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@5090fdc]: (no justification provided)
21:48 milimetric@deploy2002: Finished deploy [analytics/refinery@eeb98ac] (thin): Syncing changes to HDFS (duration: 00m 06s)
21:48 milimetric@deploy2002: Started deploy [analytics/refinery@eeb98ac] (thin): Syncing changes to HDFS
21:48 milimetric@deploy2002: Finished deploy [analytics/refinery@eeb98ac]: Syncing changes to HDFS (duration: 81m 46s)
21:26 mutante: running puppet on all prometheus*
20:26 milimetric@deploy2002: Started deploy [analytics/refinery@eeb98ac]: Syncing changes to HDFS
15:44 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
15:25 klausman@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
15:01 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:00 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
14:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54482 and previous config saved to /var/cache/conftool/dbconfig/20231215-144624-arnaudb.json
14:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
14:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
14:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
14:40 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:39 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54481 and previous config saved to /var/cache/conftool/dbconfig/20231215-143812-arnaudb.json
14:31 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 80%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54480 and previous config saved to /var/cache/conftool/dbconfig/20231215-143118-arnaudb.json
14:27 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
14:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on db2194.codfw.wmnet with reason: production freeze will occur before cookbook is finished
14:27 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
14:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on db2194.codfw.wmnet with reason: production freeze will occur before cookbook is finished
14:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54479 and previous config saved to /var/cache/conftool/dbconfig/20231215-142307-arnaudb.json
14:16 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 40%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54478 and previous config saved to /var/cache/conftool/dbconfig/20231215-141613-arnaudb.json
14:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54477 and previous config saved to /var/cache/conftool/dbconfig/20231215-140802-arnaudb.json
14:07 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
14:07 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
14:01 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 20%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54476 and previous config saved to /var/cache/conftool/dbconfig/20231215-140108-arnaudb.json
13:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
13:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: candidate master proper repooling', diff saved to https://phabricator.wikimedia.org/P54475 and previous config saved to /var/cache/conftool/dbconfig/20231215-135257-arnaudb.json
13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'depool db2179 to repool w/ api', diff saved to https://phabricator.wikimedia.org/P54474 and previous config saved to /var/cache/conftool/dbconfig/20231215-135228-arnaudb.json
13:46 arnaudb@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: candidate master repooling', diff saved to https://phabricator.wikimedia.org/P54473 and previous config saved to /var/cache/conftool/dbconfig/20231215-134603-arnaudb.json
13:39 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Test upgrade GitLab Replica with insufficient API key
13:39 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Test upgrade GitLab Replica with insufficient API key
12:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
12:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
12:25 hashar@deploy2002: Finished deploy [integration/docroot@7f6c112]: doc: add integration/tox-jenkins-override - T353515 (duration: 00m 06s)
12:25 hashar@deploy2002: Started deploy [integration/docroot@7f6c112]: doc: add integration/tox-jenkins-override - T353515
11:28 hashar@deploy2002: Finished deploy [gerrit/gerrit@304c63a]: wm-pcc: only act on Puppet repositories - T353181 (duration: 00m 08s)
11:28 hashar@deploy2002: Started deploy [gerrit/gerrit@304c63a]: wm-pcc: only act on Puppet repositories - T353181
10:56 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
10:54 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
10:52 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
09:05 moritzm: installing Linux 6.1.67 packages on Bookworm hosts
08:56 XioNoX: shutdown already down IPv6 BGP session from ulsfo to the office

2023-12-14

23:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief1002.eqiad.wmnet with OS bookworm
23:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief1002.eqiad.wmnet with reason: host reimage
22:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief1002.eqiad.wmnet with reason: host reimage
22:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host acmechief1002.eqiad.wmnet with OS bookworm
21:24 ssastry@deploy2002: Finished scap: Backport for Revert "Temporarily disable isPreview in Parsoid's rendering" (duration: 10m 38s)
21:18 ssastry@deploy2002: ssastry: Continuing with sync
21:14 ssastry@deploy2002: ssastry: Backport for Revert "Temporarily disable isPreview in Parsoid's rendering" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:13 ssastry@deploy2002: Started scap: Backport for Revert "Temporarily disable isPreview in Parsoid's rendering"
20:52 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
20:51 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
20:51 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
20:51 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
20:51 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
20:50 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
20:50 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
20:50 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
20:50 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
20:49 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
20:48 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
20:48 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
20:48 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
20:47 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
20:47 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
20:46 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
20:46 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
20:46 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
20:45 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
20:45 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
20:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs[1009-1010].eqiad.wmnet
20:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
20:40 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
20:39 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
20:39 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
20:39 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
20:39 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
20:38 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
20:38 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
20:38 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
20:38 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
20:37 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
20:37 bd808@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
20:31 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
20:23 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs[1009-1010].eqiad.wmnet
20:06 jmm@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM moscovium.eqiad.wmnet
20:02 jmm@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM moscovium.eqiad.wmnet
19:12 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.9 refs T350085
19:03 brennen: 1.42.0-wmf.9 (T350085) status: no current blockers, although we should keep an eye on T353400. rolling to all wikis.
18:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54462 and previous config saved to /var/cache/conftool/dbconfig/20231214-183508-arnaudb.json
18:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54461 and previous config saved to /var/cache/conftool/dbconfig/20231214-183459-arnaudb.json
18:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54460 and previous config saved to /var/cache/conftool/dbconfig/20231214-182003-arnaudb.json
18:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54459 and previous config saved to /var/cache/conftool/dbconfig/20231214-181954-arnaudb.json
18:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54458 and previous config saved to /var/cache/conftool/dbconfig/20231214-180458-arnaudb.json
18:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54457 and previous config saved to /var/cache/conftool/dbconfig/20231214-180449-arnaudb.json
17:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54456 and previous config saved to /var/cache/conftool/dbconfig/20231214-174953-arnaudb.json
17:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54455 and previous config saved to /var/cache/conftool/dbconfig/20231214-174944-arnaudb.json
17:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54453 and previous config saved to /var/cache/conftool/dbconfig/20231214-173448-arnaudb.json
17:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54452 and previous config saved to /var/cache/conftool/dbconfig/20231214-173439-arnaudb.json
17:24 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:23 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54451 and previous config saved to /var/cache/conftool/dbconfig/20231214-171943-arnaudb.json
17:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54450 and previous config saved to /var/cache/conftool/dbconfig/20231214-171934-arnaudb.json
17:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 8%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54449 and previous config saved to /var/cache/conftool/dbconfig/20231214-170438-arnaudb.json
17:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 8%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54448 and previous config saved to /var/cache/conftool/dbconfig/20231214-170428-arnaudb.json
16:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 4%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54446 and previous config saved to /var/cache/conftool/dbconfig/20231214-164925-arnaudb.json
16:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 4%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54445 and previous config saved to /var/cache/conftool/dbconfig/20231214-164921-arnaudb.json
16:43 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
16:43 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
16:43 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
16:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
16:42 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
16:42 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
16:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1249 (re)pooling @ 2%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54444 and previous config saved to /var/cache/conftool/dbconfig/20231214-163420-arnaudb.json
16:34 arnaudb@cumin1001: dbctl commit (dc=all): 'db1234 (re)pooling @ 2%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54443 and previous config saved to /var/cache/conftool/dbconfig/20231214-163416-arnaudb.json
16:24 akosiaris: updates of all wikikube services done T352906
16:20 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
16:18 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
16:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
16:18 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
16:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
16:17 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
16:17 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/similar-users: apply
16:17 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/similar-users: apply
16:17 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/similar-users: apply
16:16 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
16:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/similar-users: apply
16:16 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
16:16 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
16:15 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
16:15 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
16:14 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:14 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
16:14 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
16:14 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:14 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:13 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
16:13 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
16:13 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:12 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:12 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
16:12 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
16:11 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
16:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
16:10 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
16:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
16:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
16:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
16:09 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
16:09 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
16:09 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
16:08 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
16:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
16:08 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
16:08 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
16:07 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
16:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
16:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
16:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
16:06 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
16:06 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
16:06 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
16:06 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
16:05 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
16:05 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
16:05 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
16:05 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
16:04 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
16:04 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
16:04 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
16:04 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
16:03 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
16:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
16:03 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
16:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
16:03 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
16:02 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
16:02 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
16:02 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
16:02 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
16:02 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
16:02 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
16:01 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
16:01 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
16:01 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
16:00 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
16:00 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
16:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
15:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
15:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
15:59 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
15:59 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
15:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
15:58 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
15:57 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
15:57 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
15:57 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
15:57 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
15:57 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
15:57 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
15:57 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
15:56 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
15:56 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
15:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
15:55 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
15:54 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
15:54 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet2002.codfw.wmnet
15:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:54 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
15:54 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
15:54 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
15:54 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
15:54 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
15:54 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
15:53 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
15:53 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
15:53 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
15:53 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
15:53 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/media-analytics: apply
15:53 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
15:53 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
15:53 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
15:52 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
15:52 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
15:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
15:51 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
15:51 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
15:51 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
15:51 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
15:51 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
15:51 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
15:51 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
15:50 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
15:50 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
15:50 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
15:50 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
15:50 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
15:50 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
15:50 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
15:50 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
15:49 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
15:49 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
15:49 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
15:49 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
15:49 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
15:49 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
15:48 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
15:48 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
15:48 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
15:48 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
15:48 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
15:48 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
15:46 dzahn@cumin2002: START - Cookbook sre.dns.netbox
15:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1149.eqiad.wmnet onto db1249.eqiad.wmnet
15:42 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
15:42 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
15:42 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
15:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
15:42 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
15:42 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
15:42 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts planet2002.codfw.wmnet
15:40 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
15:40 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
15:40 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
15:40 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
15:40 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
15:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
15:35 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
15:35 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
15:31 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@4946bb7]: (no justification provided) (duration: 00m 48s)
15:30 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@4946bb7]: (no justification provided)
15:29 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:28 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:28 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
15:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
15:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
15:27 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
15:27 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
15:27 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
15:17 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
15:17 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
15:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
15:16 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
15:15 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
14:46 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:45 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:45 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:45 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:44 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:44 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:43 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:43 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:22 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:22 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:07 moritzm: installing ruby-rails-html-sanitizer security updates
14:01 moritzm: installing ruby-loofah security updates
13:56 moritzm: installing reportbug bugfix updates on buster
13:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1137.eqiad.wmnet
13:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:54 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
13:53 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
13:52 moritzm: installing netty security updates
13:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1148.eqiad.wmnet
13:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:51 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
13:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1132.eqiad.wmnet
13:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1132.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
13:50 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1132.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
13:48 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
13:44 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1148.eqiad.wmnet
13:43 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1137.eqiad.wmnet
13:42 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1132.eqiad.wmnet
13:42 arnaudb@cumin1001: dbctl commit (dc=all): 'decommissionning hosts', diff saved to https://phabricator.wikimedia.org/P54437 and previous config saved to /var/cache/conftool/dbconfig/20231214-134203-arnaudb.json
13:21 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1134.eqiad.wmnet onto db1234.eqiad.wmnet
13:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1134 in db1234 for T344036', diff saved to https://phabricator.wikimedia.org/P54436 and previous config saved to /var/cache/conftool/dbconfig/20231214-131913-arnaudb.json
13:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
13:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
13:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
13:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T344036
13:12 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1149.eqiad.wmnet onto db1249.eqiad.wmnet
13:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1149 in db1249 for T344036', diff saved to https://phabricator.wikimedia.org/P54435 and previous config saved to /var/cache/conftool/dbconfig/20231214-131017-arnaudb.json
13:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
13:09 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
13:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
13:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: provisionning db1249.eqiad.wmnet - T344036
12:45 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
12:45 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
12:42 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
12:10 cgoubert@deploy2002: Finished scap: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841 (duration: 04m 16s)
12:05 cgoubert@deploy2002: Started scap: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841
12:03 cgoubert@deploy2002: sync-world aborted: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841 (duration: 00m 02s)
12:03 cgoubert@deploy2002: Started scap: Deploying php-fpm-exporter 0.0.3 - 982431, mw-api-int: replicas x125% - 982841
12:01 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
12:01 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
11:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54434 and previous config saved to /var/cache/conftool/dbconfig/20231214-115332-arnaudb.json
11:51 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
11:49 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1002.eqiad.wmnet
11:42 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1002.eqiad.wmnet
11:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54433 and previous config saved to /var/cache/conftool/dbconfig/20231214-113826-arnaudb.json
11:31 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1001.eqiad.wmnet
11:30 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
11:25 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1001.eqiad.wmnet
11:24 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54432 and previous config saved to /var/cache/conftool/dbconfig/20231214-112321-arnaudb.json
11:12 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version
11:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54431 and previous config saved to /var/cache/conftool/dbconfig/20231214-110816-arnaudb.json
11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54430 and previous config saved to /var/cache/conftool/dbconfig/20231214-110754-arnaudb.json
11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54429 and previous config saved to /var/cache/conftool/dbconfig/20231214-110733-arnaudb.json
11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54428 and previous config saved to /var/cache/conftool/dbconfig/20231214-110714-arnaudb.json
11:06 _joe_: restarted apache2 on lists1001
10:58 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 100%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54427 and previous config saved to /var/cache/conftool/dbconfig/20231214-105814-arnaudb.json
10:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54426 and previous config saved to /var/cache/conftool/dbconfig/20231214-105311-arnaudb.json
10:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54425 and previous config saved to /var/cache/conftool/dbconfig/20231214-105248-arnaudb.json
10:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54424 and previous config saved to /var/cache/conftool/dbconfig/20231214-105228-arnaudb.json
10:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 75%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54423 and previous config saved to /var/cache/conftool/dbconfig/20231214-105209-arnaudb.json
10:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update codfw-eqiad transport ptr - ayounsi@cumin1001"
10:45 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update codfw-eqiad transport ptr - ayounsi@cumin1001"
10:43 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 90%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54422 and previous config saved to /var/cache/conftool/dbconfig/20231214-104308-arnaudb.json
10:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
10:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 15%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54421 and previous config saved to /var/cache/conftool/dbconfig/20231214-103806-arnaudb.json
10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54420 and previous config saved to /var/cache/conftool/dbconfig/20231214-103743-arnaudb.json
10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54419 and previous config saved to /var/cache/conftool/dbconfig/20231214-103723-arnaudb.json
10:37 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54418 and previous config saved to /var/cache/conftool/dbconfig/20231214-103704-arnaudb.json
10:28 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 80%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54417 and previous config saved to /var/cache/conftool/dbconfig/20231214-102803-arnaudb.json
10:26 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
10:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54416 and previous config saved to /var/cache/conftool/dbconfig/20231214-102301-arnaudb.json
10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54415 and previous config saved to /var/cache/conftool/dbconfig/20231214-102238-arnaudb.json
10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54414 and previous config saved to /var/cache/conftool/dbconfig/20231214-102218-arnaudb.json
10:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 25%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54413 and previous config saved to /var/cache/conftool/dbconfig/20231214-102159-arnaudb.json
10:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new cumin1002 host - jmm@cumin2002"
10:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new cumin1002 host - jmm@cumin2002"
10:14 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
10:14 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
10:14 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
10:14 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
10:14 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
10:13 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
10:12 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 70%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54412 and previous config saved to /var/cache/conftool/dbconfig/20231214-101258-arnaudb.json
10:12 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
10:12 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
10:11 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
10:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
10:11 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
10:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
10:08 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54411 and previous config saved to /var/cache/conftool/dbconfig/20231214-100756-arnaudb.json
10:07 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
10:07 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54410 and previous config saved to /var/cache/conftool/dbconfig/20231214-100733-arnaudb.json
10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54409 and previous config saved to /var/cache/conftool/dbconfig/20231214-100713-arnaudb.json
10:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
10:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54408 and previous config saved to /var/cache/conftool/dbconfig/20231214-100654-arnaudb.json
10:06 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
10:05 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
10:05 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
10:05 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
10:04 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
10:04 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
10:04 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
10:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
09:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
09:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
09:58 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
09:58 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
09:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
09:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 60%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54407 and previous config saved to /var/cache/conftool/dbconfig/20231214-095753-arnaudb.json
09:56 godog: remove >= 3 months old thanos blocks for prometheus/ops in eqiad/codfw and only for a single replica - T351927
09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1248 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54406 and previous config saved to /var/cache/conftool/dbconfig/20231214-095228-arnaudb.json
09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1237 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54405 and previous config saved to /var/cache/conftool/dbconfig/20231214-095208-arnaudb.json
09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1232 (re)pooling @ 5%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54404 and previous config saved to /var/cache/conftool/dbconfig/20231214-095149-arnaudb.json
09:51 hashar: Restarting CI Jenkins
09:49 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
09:49 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
09:49 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
09:49 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
09:49 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
09:48 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
09:42 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 50%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54402 and previous config saved to /var/cache/conftool/dbconfig/20231214-094248-arnaudb.json
09:40 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
09:39 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
09:39 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
09:38 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
09:38 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
09:38 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cumin1002.eqiad.wmnet with OS bullseye
09:30 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version
09:27 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 40%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54401 and previous config saved to /var/cache/conftool/dbconfig/20231214-092743-arnaudb.json
09:27 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
09:27 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin1002.eqiad.wmnet with reason: host reimage
09:25 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
09:24 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
09:24 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
09:24 akosiaris: update all the other services. T352906
09:24 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
09:24 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
09:24 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
09:22 godog: delete raw replica blocks for prometheus/ops (only one replica) in codfw - T351927
09:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin1002.eqiad.wmnet with reason: host reimage
09:21 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
09:20 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
09:20 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
09:20 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
09:20 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
09:19 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
09:12 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 30%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54400 and previous config saved to /var/cache/conftool/dbconfig/20231214-091238-arnaudb.json
09:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1002.eqiad.wmnet with OS bullseye
09:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host cumin1002.eqiad.wmnet
09:10 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cumin1002.eqiad.wmnet with OS bullseye
09:10 apergos: UTC morning backport and config window done
09:09 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
09:08 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
09:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
09:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1182.eqiad.wmnet onto db1233.eqiad.wmnet
09:07 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
09:06 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
09:06 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
09:06 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
09:03 ariel@deploy2002: Finished scap: Backport for RunSingleJob.php: Stop writing to $wgCommandLineMode (T353262) (duration: 09m 05s)
09:00 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
08:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 20%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54399 and previous config saved to /var/cache/conftool/dbconfig/20231214-085733-arnaudb.json
08:56 ariel@deploy2002: ariel and matmarex: Continuing with sync
08:56 ariel@deploy2002: ariel and matmarex: Backport for RunSingleJob.php: Stop writing to $wgCommandLineMode (T353262) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:54 ariel@deploy2002: Started scap: Backport for RunSingleJob.php: Stop writing to $wgCommandLineMode (T353262)
08:47 ariel@deploy2002: Finished scap: Backport for RunSingleJob.php: Remove overly complicated error handling (T353262) (duration: 08m 39s)
08:42 arnaudb@cumin1001: dbctl commit (dc=all): 'db1226 (re)pooling @ 10%: Post clone db1226 repooling', diff saved to https://phabricator.wikimedia.org/P54398 and previous config saved to /var/cache/conftool/dbconfig/20231214-084228-arnaudb.json
08:40 ariel@deploy2002: matmarex and ariel: Continuing with sync
08:39 ariel@deploy2002: matmarex and ariel: Backport for RunSingleJob.php: Remove overly complicated error handling (T353262) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:38 ariel@deploy2002: Started scap: Backport for RunSingleJob.php: Remove overly complicated error handling (T353262)
08:35 ariel@deploy2002: Finished scap: Backport for Remove references to refreshMessageBlobs.php (T314947) (duration: 10m 20s)
08:34 XioNoX: drain eqiad-codfw Arelion link for 100G migration
08:27 ariel@deploy2002: ariel and matmarex: Continuing with sync
08:26 ariel@deploy2002: ariel and matmarex: Backport for Remove references to refreshMessageBlobs.php (T314947) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:24 ariel@deploy2002: Started scap: Backport for Remove references to refreshMessageBlobs.php (T314947)
08:20 ariel@deploy2002: Finished scap: Backport for use virtual db domain for CentralAuth and GlobalBlocking (T348486) (duration: 10m 33s)
08:13 ariel@deploy2002: ariel: Continuing with sync
08:11 ariel@deploy2002: ariel: Backport for use virtual db domain for CentralAuth and GlobalBlocking (T348486) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:10 ariel@deploy2002: Started scap: Backport for use virtual db domain for CentralAuth and GlobalBlocking (T348486)
08:08 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:02 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1002.eqiad.wmnet with OS bullseye
08:01 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:00 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
07:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cumin1002.eqiad.wmnet - jmm@cumin2002"
07:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cumin1002.eqiad.wmnet - jmm@cumin2002"
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cumin1002.eqiad.wmnet on all recursors
07:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache cumin1002.eqiad.wmnet on all recursors
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cumin1002.eqiad.wmnet - jmm@cumin2002"
07:53 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cumin1002.eqiad.wmnet - jmm@cumin2002"
07:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host cumin1002.eqiad.wmnet
07:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
07:49 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
07:48 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1182.eqiad.wmnet onto db1233.eqiad.wmnet
07:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
03:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bullseye
03:06 bvibber: cleanupOrphanedTranscodes complete. requeueTranscodes continues... forever and ever and ever
02:54 bvibber: brion running cleanupOrphanedTranscodes on commonswiki on mwmaint2002
01:26 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
01:25 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
01:04 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
01:04 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1003.wikimedia.org with reason: upgrade gitlab1003 to new version https://phabricator.wikmedia.org/T353375
00:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
00:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release
00:40 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
00:38 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release
00:38 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
00:34 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=93) on GitLab host gitlab1003.wikimedia.org with reason: security release
00:34 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
00:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet1002.eqiad.wmnet
00:18 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:18 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
00:17 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: planet1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin2002"
00:15 dzahn@cumin2002: START - Cookbook sre.dns.netbox
00:11 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts planet1002.eqiad.wmnet

2023-12-13

23:48 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host acmechief1002.eqiad.wmnet
23:48 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host acmechief1002.eqiad.wmnet with OS bookworm
23:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
23:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1107.eqiad.wmnet with OS bookworm
23:17 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply new extra plugins - bking@cumin2002 - T353270
23:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1107.eqiad.wmnet with reason: host reimage
23:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1107.eqiad.wmnet with reason: host reimage
22:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1006.eqiad.wmnet with OS bullseye
22:58 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
22:57 jhuneidi@deploy2002: Finished scap: Backport for Update wgStatsTarget to port 9125 (T240685), [BC] Enable desktop diff and history pages on mobile (T350181 T353388) (duration: 09m 42s)
22:57 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
22:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1005.eqiad.wmnet with OS bullseye
22:54 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
22:53 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
22:50 jhuneidi@deploy2002: jhuneidi and jdlrobson and cwhite: Continuing with sync
22:49 jhuneidi@deploy2002: jhuneidi and jdlrobson and cwhite: Backport for Update wgStatsTarget to port 9125 (T240685), [BC] Enable desktop diff and history pages on mobile (T350181 T353388) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:48 jhuneidi@deploy2002: Started scap: Backport for Update wgStatsTarget to port 9125 (T240685), [BC] Enable desktop diff and history pages on mobile (T350181 T353388)
22:47 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
22:45 jhuneidi@deploy2002: Finished scap: Backport for tests: Use MediaWikiIntegrationTestCase::setGroupPermissions (T353210), Temporarily disable isPreview in Parsoid's rendering (duration: 10m 08s)
22:45 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
22:45 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
22:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
22:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1004.eqiad.wmnet with OS bullseye
22:40 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
22:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1005.eqiad.wmnet with reason: host reimage
22:39 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
22:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
22:38 jhuneidi@deploy2002: ssastry and jhuneidi: Continuing with sync
22:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
22:37 jhuneidi@deploy2002: ssastry and jhuneidi: Backport for tests: Use MediaWikiIntegrationTestCase::setGroupPermissions (T353210), Temporarily disable isPreview in Parsoid's rendering synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
22:36 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1005.eqiad.wmnet with reason: host reimage
22:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host acmechief1002.eqiad.wmnet with OS bookworm
22:35 jhuneidi@deploy2002: Started scap: Backport for tests: Use MediaWikiIntegrationTestCase::setGroupPermissions (T353210), Temporarily disable isPreview in Parsoid's rendering
22:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1004.eqiad.wmnet with reason: host reimage
22:24 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1006.eqiad.wmnet with OS bullseye
22:24 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1005.eqiad.wmnet with OS bullseye
22:22 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1004.eqiad.wmnet with reason: host reimage
22:18 jhuneidi@deploy2002: Finished scap: Backport for Partially undeploy Reader Demographics 2 survey (T344393), Enable $wgStatsTarget for requests to mwdebug (T240685) (duration: 12m 33s)
22:18 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief1002.eqiad.wmnet - brett@cumin2002"
22:17 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief1002.eqiad.wmnet - brett@cumin2002"
22:16 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) acmechief1002.eqiad.wmnet on all recursors
22:16 brett@cumin2002: START - Cookbook sre.dns.wipe-cache acmechief1002.eqiad.wmnet on all recursors
22:16 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:16 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief1002.eqiad.wmnet - brett@cumin2002"
22:15 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief1002.eqiad.wmnet - brett@cumin2002"
22:12 brett@cumin2002: START - Cookbook sre.dns.netbox
22:11 brett@cumin2002: START - Cookbook sre.ganeti.makevm for new host acmechief1002.eqiad.wmnet
22:11 jhuneidi@deploy2002: dani and jhuneidi and cwhite: Continuing with sync
22:09 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1004.eqiad.wmnet with OS bullseye
22:07 jhuneidi@deploy2002: dani and jhuneidi and cwhite: Backport for Partially undeploy Reader Demographics 2 survey (T344393), Enable $wgStatsTarget for requests to mwdebug (T240685) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:05 jhuneidi@deploy2002: Started scap: Backport for Partially undeploy Reader Demographics 2 survey (T344393), Enable $wgStatsTarget for requests to mwdebug (T240685)
22:01 jhuneidi@deploy2002: Finished scap: Backport for Restore fixed width and height, direction of arrow on change list pages (T352456 T353099) (duration: 10m 28s)
21:59 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply new extra plugins - bking@cumin2002 - T353270
21:54 jhuneidi@deploy2002: jhuneidi and jdlrobson: Continuing with sync
21:52 jhuneidi@deploy2002: jhuneidi and jdlrobson: Backport for Restore fixed width and height, direction of arrow on change list pages (T352456 T353099) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:50 jhuneidi@deploy2002: Started scap: Backport for Restore fixed width and height, direction of arrow on change list pages (T352456 T353099)
21:04 cstone: civicrm upgraded from 834606ef to e2d49d10
20:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts planet1002.eqiad.wmnet
20:33 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
20:28 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts planet1002.eqiad.wmnet
19:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2031.codfw.wmnet
19:31 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2031.codfw.wmnet
19:19 brennen@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.9 refs T350085 (duration: 07m 29s)
19:12 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.9 refs T350085
19:03 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply new extra plugins - bking@cumin2002 - T353270
19:01 brennen: 1.42.0-wmf.9 (T350085) status: no blockers, rolling to group1
18:07 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:07 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:06 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
18:05 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
17:58 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:57 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:44 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply new extra plugins - bking@cumin2002 - T353270
17:27 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
17:25 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:56 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore1006']
16:56 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore1005']
16:56 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore1004']
16:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore1006']
16:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore1005']
16:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore1004']
16:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1148.eqiad.wmnet onto db1248.eqiad.wmnet
16:39 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
16:39 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new extra plugins - bking@cumin2002 - T353270
16:38 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
16:38 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore1006.mgmt.eqiad.wmnet with reboot policy FORCED
16:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54395 and previous config saved to /var/cache/conftool/dbconfig/20231213-163657-arnaudb.json
16:36 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore1005.mgmt.eqiad.wmnet with reboot policy FORCED
16:36 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:36 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:35 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:35 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:34 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:31 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore1004.mgmt.eqiad.wmnet with reboot policy FORCED
16:30 vriley@cumin1001: START - Cookbook sre.hosts.provision for host sessionstore1006.mgmt.eqiad.wmnet with reboot policy FORCED
16:29 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sessionstore1006
16:28 vriley@cumin1001: START - Cookbook sre.hosts.provision for host sessionstore1005.mgmt.eqiad.wmnet with reboot policy FORCED
16:27 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sessionstore1006
16:27 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
16:26 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sessionstore1005
16:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
16:25 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sessionstore1005
16:23 vriley@cumin1001: START - Cookbook sre.hosts.provision for host sessionstore1004.mgmt.eqiad.wmnet with reboot policy FORCED
16:22 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sessionstore1004
16:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 90%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54394 and previous config saved to /var/cache/conftool/dbconfig/20231213-162152-arnaudb.json
16:20 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host sessionstore1004
16:19 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
16:19 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
16:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
16:16 ladsgroup@deploy2002: Finished scap: Backport for Fix my email in the key list (duration: 08m 45s)
16:15 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
16:15 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
16:14 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new extra plugins - bking@cumin2002 - T353270
16:13 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
16:12 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
16:12 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
16:11 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
16:10 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
16:09 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
16:09 ladsgroup@deploy2002: ladsgroup: Continuing with sync
16:09 ladsgroup@deploy2002: ladsgroup: Backport for Fix my email in the key list synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:08 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:08 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:07 ladsgroup@deploy2002: Started scap: Backport for Fix my email in the key list
16:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 80%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54393 and previous config saved to /var/cache/conftool/dbconfig/20231213-160647-arnaudb.json
16:05 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:05 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
16:04 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
16:04 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
16:04 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:04 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
16:04 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
16:03 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/blubberoid: apply
16:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:03 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:01 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
16:01 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
16:01 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
16:01 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
16:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
16:00 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
16:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
16:00 akosiaris: upgrade apertium, bluebberoid everywhere to use the latest service_proxy image, 1.23.10-2-s4-20231203 T352906
15:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/apertium: apply
15:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
15:59 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
15:59 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/apertium: apply
15:59 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
15:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
15:58 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1132.eqiad.wmnet onto db1232.eqiad.wmnet
15:58 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
15:57 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
15:56 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
15:56 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
15:51 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
15:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 70%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54392 and previous config saved to /var/cache/conftool/dbconfig/20231213-155142-arnaudb.json
15:51 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
15:51 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
15:50 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
15:49 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
15:46 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
15:45 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
15:44 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
15:43 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
15:40 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
15:39 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
15:39 claime: Deploying shellbox: update php-fpm-exporter version - 982432
15:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 60%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54389 and previous config saved to /var/cache/conftool/dbconfig/20231213-153636-arnaudb.json
15:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1147.eqiad.wmnet
15:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1147.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:35 Amir1: tagging 1.41.0-rc.0 in core
15:35 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1147.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:34 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:33 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
15:28 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1147.eqiad.wmnet
15:25 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1129.eqiad.wmnet
15:25 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:25 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1129.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:24 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1129.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:21 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
15:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 50%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54387 and previous config saved to /var/cache/conftool/dbconfig/20231213-152131-arnaudb.json
15:17 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
15:16 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1129.eqiad.wmnet
15:15 ladsgroup@deploy2002: Finished scap: Backport for docroot: Add my pgp key (duration: 09m 50s)
15:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1128.eqiad.wmnet
15:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1128.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:12 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1128.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:10 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
15:09 ladsgroup@deploy2002: ladsgroup: Continuing with sync
15:07 ladsgroup@deploy2002: ladsgroup: Backport for docroot: Add my pgp key synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 40%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54386 and previous config saved to /var/cache/conftool/dbconfig/20231213-150626-arnaudb.json
15:06 ladsgroup@deploy2002: Started scap: Backport for docroot: Add my pgp key
15:05 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1128.eqiad.wmnet
15:04 arnaudb@cumin1001: dbctl commit (dc=all): 'decommission db1128 29 and 47', diff saved to https://phabricator.wikimedia.org/P54385 and previous config saved to /var/cache/conftool/dbconfig/20231213-150425-arnaudb.json
15:00 Lucas_WMDE: UTC afternoon backport+config window done
15:00 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for CheckUser: Enable read new for event tables migration on group1 (T341829) (duration: 08m 29s)
14:53 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and dreamyjazz: Continuing with sync
14:53 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and dreamyjazz: Backport for CheckUser: Enable read new for event tables migration on group1 (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:51 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for CheckUser: Enable read new for event tables migration on group1 (T341829)
14:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 30%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54384 and previous config saved to /var/cache/conftool/dbconfig/20231213-145121-arnaudb.json
14:49 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Utilities/Yaml: Use string as value with ini_set (T348496) (duration: 19m 09s)
14:43 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:43 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
14:42 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and abi: Continuing with sync
14:42 hashar: Restarted Gerrit on gerrit1003 and gerrit2002
14:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 20%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54383 and previous config saved to /var/cache/conftool/dbconfig/20231213-143616-arnaudb.json
14:33 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and abi: Backport for Utilities/Yaml: Use string as value with ini_set (T348496) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:30 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Utilities/Yaml: Use string as value with ini_set (T348496)
14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
14:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 10%: Post clone (source of db1226) repooling', diff saved to https://phabricator.wikimedia.org/P54381 and previous config saved to /var/cache/conftool/dbconfig/20231213-142111-arnaudb.json
14:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
14:02 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1148.eqiad.wmnet onto db1248.eqiad.wmnet
14:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1148 in db1248 for T344036', diff saved to https://phabricator.wikimedia.org/P54380 and previous config saved to /var/cache/conftool/dbconfig/20231213-140017-arnaudb.json
13:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: provisionning db1248.eqiad.wmnet - T344036
13:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: provisionning db1248.eqiad.wmnet - T344036
13:57 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: provisionning db1248.eqiad.wmnet - T344036
13:53 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
13:53 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
13:51 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
13:51 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
13:50 moritzm: installing postgresql-11 security updates
13:49 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
13:48 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1233.eqiad.wmnet
13:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1129 in db1233 for T344036', diff saved to https://phabricator.wikimedia.org/P54379 and previous config saved to /var/cache/conftool/dbconfig/20231213-134632-arnaudb.json
13:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
13:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
13:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
13:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1233.eqiad.wmnet - T344036
13:27 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1132.eqiad.wmnet onto db1232.eqiad.wmnet
13:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1132 in db1232 for T344036', diff saved to https://phabricator.wikimedia.org/P54376 and previous config saved to /var/cache/conftool/dbconfig/20231213-132511-arnaudb.json
13:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
13:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: provisionning db1232.eqiad.wmnet - T344036
13:05 godog: delete raw replica blocks for prometheus/ops (only one replica) in eqiad - T351927
12:55 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
12:42 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:42 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:40 moritzm: installing OpenSSH security updates on bullseye
12:25 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:25 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:16 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:16 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:11 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:11 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:10 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:09 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:08 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:08 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1233.eqiad.wmnet with OS bookworm
12:02 vgutierrez: setting cp4037 as inactive - T352876
11:49 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
11:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
11:37 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:36 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
11:33 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db1233.eqiad.wmnet with OS bookworm
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
11:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
11:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
11:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
11:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
11:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
10:50 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
10:49 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
10:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
10:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
10:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
10:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1226.eqiad.wmnet with OS bookworm
10:31 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
10:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:24 claime: Updating mw-debug prometheus-php-fpm-exporter to 0.0.3
10:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
10:11 hashar@deploy2002: Finished deploy [releng/jenkins-deploy@77b3681] (releasing): Rename jenkins-slave to jenkins-agent - T254646 (duration: 00m 42s)
10:11 hashar@deploy2002: Started deploy [releng/jenkins-deploy@77b3681] (releasing): Rename jenkins-slave to jenkins-agent - T254646
10:10 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 100%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54374 and previous config saved to /var/cache/conftool/dbconfig/20231213-100708-arnaudb.json
10:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54373 and previous config saved to /var/cache/conftool/dbconfig/20231213-100651-arnaudb.json
10:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 100%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54372 and previous config saved to /var/cache/conftool/dbconfig/20231213-100555-arnaudb.json
10:00 moritzm: failover ganeti master in eqsin to ganeti5007
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
09:57 arnaudb@cumin1001: START - Cookbook sre.hosts.reimage for host db1226.eqiad.wmnet with OS bookworm
09:56 hashar: Disabled puppet agent on contint1002, contint2002, releases1003 and releases2003 to progressively deploy https://gerrit.wikimedia.org/r/922555
09:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 90%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54371 and previous config saved to /var/cache/conftool/dbconfig/20231213-095203-arnaudb.json
09:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54370 and previous config saved to /var/cache/conftool/dbconfig/20231213-095146-arnaudb.json
09:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 90%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54369 and previous config saved to /var/cache/conftool/dbconfig/20231213-095049-arnaudb.json
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
09:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
09:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 80%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54368 and previous config saved to /var/cache/conftool/dbconfig/20231213-093658-arnaudb.json
09:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54367 and previous config saved to /var/cache/conftool/dbconfig/20231213-093641-arnaudb.json
09:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 80%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54366 and previous config saved to /var/cache/conftool/dbconfig/20231213-093544-arnaudb.json
09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
09:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
09:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
09:25 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
09:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:25 brouberol: increasing pod max requested memory to a higher value than the container max requested memory for dse-k8s-eqiad - T351722
09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 70%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54365 and previous config saved to /var/cache/conftool/dbconfig/20231213-092153-arnaudb.json
09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54364 and previous config saved to /var/cache/conftool/dbconfig/20231213-092136-arnaudb.json
09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 70%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54363 and previous config saved to /var/cache/conftool/dbconfig/20231213-092039-arnaudb.json
09:20 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
09:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 60%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54362 and previous config saved to /var/cache/conftool/dbconfig/20231213-090648-arnaudb.json
09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54361 and previous config saved to /var/cache/conftool/dbconfig/20231213-090631-arnaudb.json
09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 60%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54360 and previous config saved to /var/cache/conftool/dbconfig/20231213-090534-arnaudb.json
08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 202120
08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 202120
08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 50%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54359 and previous config saved to /var/cache/conftool/dbconfig/20231213-085143-arnaudb.json
08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54358 and previous config saved to /var/cache/conftool/dbconfig/20231213-085125-arnaudb.json
08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 50%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54357 and previous config saved to /var/cache/conftool/dbconfig/20231213-085027-arnaudb.json
08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
08:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
08:48 XioNoX: delete bgp group Confed_drmrs from cr1-esams - T347892
08:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3856
08:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
08:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 46997
08:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46997
08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 40%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54356 and previous config saved to /var/cache/conftool/dbconfig/20231213-083638-arnaudb.json
08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54355 and previous config saved to /var/cache/conftool/dbconfig/20231213-083620-arnaudb.json
08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 40%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54354 and previous config saved to /var/cache/conftool/dbconfig/20231213-083522-arnaudb.json
08:30 XioNoX: delete bgp group Confed_esams from cr2-drmrs - T347892
08:25 mlitn@deploy2002: Finished scap: Backport for No custom UW licensing config (duration: 09m 43s)
08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 30%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54353 and previous config saved to /var/cache/conftool/dbconfig/20231213-082133-arnaudb.json
08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54352 and previous config saved to /var/cache/conftool/dbconfig/20231213-082115-arnaudb.json
08:20 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 30%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54351 and previous config saved to /var/cache/conftool/dbconfig/20231213-082017-arnaudb.json
08:18 mlitn@deploy2002: mlitn: Continuing with sync
08:17 mlitn@deploy2002: mlitn: Backport for No custom UW licensing config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:16 mlitn@deploy2002: Started scap: Backport for No custom UW licensing config
08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1020.eqiad.wmnet with OS bookworm
08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 20%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54350 and previous config saved to /var/cache/conftool/dbconfig/20231213-080628-arnaudb.json
08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54349 and previous config saved to /var/cache/conftool/dbconfig/20231213-080610-arnaudb.json
08:06 moritzm: installing openssh security updates
08:05 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 20%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54348 and previous config saved to /var/cache/conftool/dbconfig/20231213-080512-arnaudb.json
07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1020.eqiad.wmnet with reason: host reimage
07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1020.eqiad.wmnet with reason: host reimage
07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1229 (re)pooling @ 10%: Post reboot repooling', diff saved to https://phabricator.wikimedia.org/P54347 and previous config saved to /var/cache/conftool/dbconfig/20231213-075123-arnaudb.json
07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54346 and previous config saved to /var/cache/conftool/dbconfig/20231213-075105-arnaudb.json
07:50 arnaudb@cumin1001: dbctl commit (dc=all): 'db1247 (re)pooling @ 10%: Post clone repooling', diff saved to https://phabricator.wikimedia.org/P54345 and previous config saved to /var/cache/conftool/dbconfig/20231213-075006-arnaudb.json
07:43 arnaudb@cumin1001: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
07:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1020.eqiad.wmnet with OS bookworm
06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1021.eqiad.wmnet with OS bookworm
06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1021.eqiad.wmnet with reason: host reimage
06:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1021.eqiad.wmnet with reason: host reimage
05:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1021.eqiad.wmnet with OS bookworm
03:41 hashar@deploy2002: Finished deploy [gerrit/gerrit@9bf8914]: Add a banner for the 2023 developer survey - T351109 (duration: 00m 08s)
03:41 hashar@deploy2002: Started deploy [gerrit/gerrit@9bf8914]: Add a banner for the 2023 developer survey - T351109

2023-12-12

23:56 ejegg: donorwiki upgraded from f7407053 to bc49e5a6
23:26 tzatziki: removing 2 files for legal compliance
23:05 tzatziki: removing 2 files for legal compliance
22:57 mutante: planet - switched to eqiad and bookworm backend (T348392 T345617) - https://meta.wikimedia.org/wiki/Planet_Wikimedia
22:43 mutante: planet2003 -manually upgrade rawdog package to 3.0.2 T348392
21:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet1003.eqiad.wmnet with reason: debugging
21:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on planet1003.eqiad.wmnet with reason: debugging
21:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet1003.eqiad.wmnet with reason: debugging
21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet2003.codfw.wmnet with reason: debugging
21:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on planet2003.codfw.wmnet with reason: debugging
21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on planet2003.codfw.wmnet with reason: reimage
21:18 samtar@deploy2002: Finished scap: Backport for Add stream config for Android article instruments (T351292) (duration: 11m 59s)
21:10 samtar@deploy2002: cjming and samtar: Continuing with sync
21:07 samtar@deploy2002: cjming and samtar: Backport for Add stream config for Android article instruments (T351292) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:06 samtar@deploy2002: Started scap: Backport for Add stream config for Android article instruments (T351292)
20:42 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
20:40 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
20:38 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
20:37 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
20:33 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
20:30 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
20:28 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:17 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:05 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
20:04 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
19:59 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
19:57 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:56 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
19:46 ryankemper@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:46 ryankemper@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:43 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.9 refs T350085
19:33 brennen@deploy2002: Finished scap: Backport for ParserOutput::getText(): do not clone ParserOutput when invoking pipeline (T353257) (duration: 09m 41s)
19:26 brennen@deploy2002: brennen and ssastry: Continuing with sync
19:25 brennen@deploy2002: brennen and ssastry: Backport for ParserOutput::getText(): do not clone ParserOutput when invoking pipeline (T353257) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:23 brennen@deploy2002: Started scap: Backport for ParserOutput::getText(): do not clone ParserOutput when invoking pipeline (T353257)
19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
19:08 brennen: 1.42.0-wmf.9 (T350085) status: deploying a fix for T353257 and then will proceed to group0.
19:03 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
19:03 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: enable new wmf-elasticsearch-search-plugins - bking@cumin2002 - T353270
18:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host phab2002.codfw.wmnet with OS bullseye
18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
18:32 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
18:31 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
18:29 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
18:28 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
18:12 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host phab2002.codfw.wmnet with OS bullseye
18:10 mutante: reimaging phab2002 (stand-by phorge server with bullseye - T327068
17:42 ejegg: fundraising civicrm upgraded from 8c107215 to 834606ef
17:33 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:33 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt sessionstore - jclark@cumin1001"
17:32 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt sessionstore - jclark@cumin1001"
17:32 ejegg: payments-wiki upgraded from 1d24dc90 to c1181b95
17:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testhost2001.codfw.wmnet with OS bullseye
17:30 jclark@cumin1001: START - Cookbook sre.dns.netbox
17:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on phab2002.codfw.wmnet with reason: reimage
17:16 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on phab2002.codfw.wmnet with reason: reimage
17:16 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:16 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:13 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:13 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:13 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:34 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Waiting for hardware install
16:33 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Waiting for hardware install
16:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2001.codfw.wmnet
16:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes1060']
16:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1060']
16:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2001.codfw.wmnet
16:05 brennen@deploy2002: Finished deploy [phabricator/deployment@c243cc2]: deploy to phab1004 for T353274 (duration: 00m 48s)
16:04 brennen@deploy2002: Started deploy [phabricator/deployment@c243cc2]: deploy to phab1004 for T353274
16:04 brennen@deploy2002: Finished deploy [phabricator/deployment@c243cc2]: test deploy to phab2002 for T353274 (duration: 00m 32s)
16:03 brennen@deploy2002: Started deploy [phabricator/deployment@c243cc2]: test deploy to phab2002 for T353274
16:03 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
16:03 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
16:00 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
15:59 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
15:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host testhost2001.codfw.wmnet with OS bullseye
15:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1137.eqiad.wmnet onto db1237.eqiad.wmnet
15:30 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
15:30 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
15:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:28 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:28 cgoubert@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:27 cgoubert@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
15:27 cgoubert@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
15:27 cgoubert@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:26 cgoubert@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:25 cgoubert@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:25 cgoubert@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:25 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:24 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
15:23 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:22 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:22 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:21 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:21 claime: Deploying new calico BGPPeers for codfw rows a/b - T352893
14:54 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1137.eqiad.wmnet onto db1237.eqiad.wmnet
14:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1137 in db1237 for T344036', diff saved to https://phabricator.wikimedia.org/P54339 and previous config saved to /var/cache/conftool/dbconfig/20231212-145205-arnaudb.json
14:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
14:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
14:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
14:50 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly)
14:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: provisionning db1237.eqiad.wmnet - T344036
14:48 phuedx: UTC afternoon backport window done
14:47 phuedx@deploy2002: Finished scap: Backport for Partially undeploy Reader Demographics 2 survey (T344393) (duration: 24m 33s)
14:39 phuedx@deploy2002: phuedx and dani: Continuing with sync
14:35 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1226.eqiad.wmnet
14:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on netbox-dev2002.codfw.wmnet with reason: Restoring DB from backup on netbox-dev2002
14:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on netbox-dev2002.codfw.wmnet with reason: Restoring DB from backup on netbox-dev2002
14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1211 in db1226 for T344036', diff saved to https://phabricator.wikimedia.org/P54336 and previous config saved to /var/cache/conftool/dbconfig/20231212-143233-arnaudb.json
14:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
14:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
14:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
14:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: provisionning db1226.eqiad.wmnet - T344036
14:24 phuedx@deploy2002: phuedx and dani: Backport for Partially undeploy Reader Demographics 2 survey (T344393) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:22 phuedx@deploy2002: Started scap: Backport for Partially undeploy Reader Demographics 2 survey (T344393)
13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:45 brouberol: increasing max container memory requests in dse-k8s from 3GB to 8GB - T351722
13:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1128.eqiad.wmnet onto db1228.eqiad.wmnet
13:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1129.eqiad.wmnet onto db1229.eqiad.wmnet
13:16 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
13:16 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1002.eqiad.wmnet
13:09 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1002.eqiad.wmnet
13:09 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
13:06 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1147.eqiad.wmnet onto db1247.eqiad.wmnet
13:00 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster1001.eqiad.wmnet
12:57 jayme@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
12:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup1011.eqiad.wmnet
12:55 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster1001.eqiad.wmnet
12:53 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
12:52 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
12:51 jayme@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
12:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host backup1011.eqiad.wmnet
12:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup1010.eqiad.wmnet
12:45 jayme: increasing memory of ganeti instance kubemaster2001.codfw.wmnet from 4G to 12G (requires reboot) - T353233
12:38 claime: Uncordoning kubernetes10[59-62].eqiad.wmnet - T353135
12:37 claime: Pooling kubernetes10[59-62].eqiad.wmnet - T353135
12:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup2011.codfw.wmnet
12:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host backup2011.codfw.wmnet
12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host backup2010.codfw.wmnet
12:03 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host backup2010.codfw.wmnet
11:43 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
11:43 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
11:28 moritzm: installing postgresql-11 security updates
10:50 samtar@deploy2002: Finished scap: Backport for testwiki: Enable the Edit Recovery feature (T353041) (duration: 09m 51s)
10:47 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1129.eqiad.wmnet onto db1229.eqiad.wmnet
10:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1129 in db1229 for T344036', diff saved to https://phabricator.wikimedia.org/P54335 and previous config saved to /var/cache/conftool/dbconfig/20231212-104404-arnaudb.json
10:43 samtar@deploy2002: samtar and samwilson: Continuing with sync
10:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
10:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
10:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
10:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: provisionning db1229.eqiad.wmnet - T344036
10:41 samtar@deploy2002: samtar and samwilson: Backport for testwiki: Enable the Edit Recovery feature (T353041) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:40 samtar@deploy2002: Started scap: Backport for testwiki: Enable the Edit Recovery feature (T353041)
10:30 moritzm: installing nghttp2 security updates
10:16 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:15 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:13 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
10:13 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
10:09 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
10:09 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
10:05 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
10:04 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
10:04 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
10:04 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
09:57 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1128.eqiad.wmnet onto db1228.eqiad.wmnet
09:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1228 clone from db1128 ', diff saved to https://phabricator.wikimedia.org/P54334 and previous config saved to /var/cache/conftool/dbconfig/20231212-095352-arnaudb.json
09:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
09:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
09:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
09:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: provisionning db1228.eqiad.wmnet - T344036
09:43 moritzm: installing ca-certificates-java updates from Bookworm point release
09:08 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1147.eqiad.wmnet onto db1247.eqiad.wmnet
09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Cloning db1147 in db1247 for T344036', diff saved to https://phabricator.wikimedia.org/P54333 and previous config saved to /var/cache/conftool/dbconfig/20231212-090652-arnaudb.json
09:05 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
09:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
09:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
09:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
09:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: provisionning db1247.eqiad.wmnet - T344036
08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: server BGP in netbox plugin - ayounsi@cumin1001
08:48 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: server BGP in netbox plugin - ayounsi@cumin1001
08:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2135,2160].codfw.wmnet,db[1176,1217].eqiad.wmnet with reason: m5 ipoid maintenance
08:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2135,2160].codfw.wmnet,db[1176,1217].eqiad.wmnet with reason: m5 ipoid maintenance
07:52 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
07:52 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
07:50 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
07:49 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
07:17 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4800
07:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4800
06:46 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 as master of pc1" (duration: 09m 00s)
06:38 marostegui@deploy2002: marostegui: Continuing with sync
06:38 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 as master of pc1" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:37 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 as master of pc1"
06:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2011.codfw.wmnet with OS bookworm
06:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2011.codfw.wmnet with reason: host reimage
06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2011.codfw.wmnet with reason: host reimage
06:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc2011.codfw.wmnet with OS bookworm
05:59 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc2014 as master of pc1 (T351787) (duration: 08m 35s)
05:52 marostegui@deploy2002: marostegui: Continuing with sync
05:52 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc2014 as master of pc1 (T351787) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
05:51 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc2014 as master of pc1 (T351787)
05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
05:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
04:58 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.5 (duration: 02m 17s)
04:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.9 refs T350085 (duration: 53m 03s)
04:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.9 refs T350085

2023-12-11

22:39 jdrewniak@deploy2002: Finished scap: Backport for [Vector] Deploy the Zebra CSS refactor under feature flag (T353008) (duration: 12m 14s)
22:32 jdrewniak@deploy2002: jdrewniak: Continuing with sync
22:28 jdrewniak@deploy2002: jdrewniak: Backport for [Vector] Deploy the Zebra CSS refactor under feature flag (T353008) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:26 jdrewniak@deploy2002: Started scap: Backport for [Vector] Deploy the Zebra CSS refactor under feature flag (T353008)
22:23 ladsgroup@deploy2002: Finished scap: Backport for api: Add support for pagelinks migration in ApiQueryBacklinks::runSecondQuery (T351237) (duration: 10m 42s)
22:15 ladsgroup@deploy2002: jforrester and ladsgroup: Continuing with sync
22:14 ladsgroup@deploy2002: jforrester and ladsgroup: Backport for api: Add support for pagelinks migration in ApiQueryBacklinks::runSecondQuery (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:12 ladsgroup@deploy2002: Started scap: Backport for api: Add support for pagelinks migration in ApiQueryBacklinks::runSecondQuery (T351237)
22:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
22:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 18:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
18:34 claime: Raised replicas for mw-web
18:32 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
18:32 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
18:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
18:32 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
18:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
18:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:48 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:47 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
17:47 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:45 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:45 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
17:43 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
17:43 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
17:42 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
17:04 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 15s)
17:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2004.codfw.wmnet with OS bullseye
17:00 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:57 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:56 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 10m 12s)
16:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1060.eqiad.wmnet with OS bullseye
16:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1061.eqiad.wmnet with OS bullseye
16:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1062.eqiad.wmnet with OS bullseye
16:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1059.eqiad.wmnet with OS bullseye
16:42 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
16:39 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
16:27 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
16:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2002.codfw.wmnet with OS bullseye
16:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
16:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
16:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2005.codfw.wmnet with OS bullseye
16:21 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:20 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
16:19 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: Enable canary events for all MediaWiki event streams (T266798) (duration: 08m 25s)
16:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
16:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
16:17 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
16:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
16:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:13 vgutierrez: rolling restart of pybal on lvs1020 and lvs1017 effectively enabling IPIP encapsulation on ncredir@eqiad - T351069
16:10 ottomata: enabling canary events for all mediawiki state change event streams - T266798
16:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1062.eqiad.wmnet with OS bullseye
16:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1061.eqiad.wmnet with OS bullseye
16:02 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1060.eqiad.wmnet with OS bullseye
16:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
16:01 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1059.eqiad.wmnet with OS bullseye
16:01 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
16:00 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
15:59 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:58 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:57 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:57 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:56 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
15:55 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:55 claime: homer lsw1-*eqiad* commit "Put kubernetes10[59-62] in production - T353135"
15:55 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
15:55 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:55 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:54 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:53 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:53 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:53 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
15:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:48 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
15:39 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2006.codfw.wmnet with OS bullseye
15:39 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1143.eqiad.wmnet
15:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1143.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
15:32 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1143.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:30 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
15:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:25 brouberol: provisioning TLS certificates for the spark-history and spark-history-test namespaces in dse-k8s-eqiad - T352639
15:25 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1143.eqiad.wmnet
15:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1142.eqiad.wmnet
15:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1142.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:20 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1142.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:18 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
15:12 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1142.eqiad.wmnet
15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
15:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
15:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1141.eqiad.wmnet
15:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1141.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:01 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1141.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
14:57 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
14:57 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
14:57 milimetric@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
14:56 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
14:56 milimetric@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
14:53 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
14:53 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
14:53 arnaudb@cumin1001: dbctl commit (dc=all): 'decommission db1141 42 and 43', diff saved to https://phabricator.wikimedia.org/P54330 and previous config saved to /var/cache/conftool/dbconfig/20231211-145300-arnaudb.json
14:52 milimetric@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
14:52 milimetric@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
14:51 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1141.eqiad.wmnet
14:51 milimetric@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
14:51 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
14:50 milimetric@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
14:50 otto@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
14:49 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
14:49 milimetric@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
14:48 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
14:48 milimetric@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
14:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
14:46 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
14:45 otto@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
14:45 ottomata: deploying changeprop to pick up https://phabricator.wikimedia.org/T351247
14:37 TheresNoTime: close UTC afternoon backport window
14:25 samtar@deploy2002: Finished scap: Backport for hewikivoyage: update vector 2022 wordmark and tagline (T351981) (duration: 10m 35s)
14:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1138.eqiad.wmnet - arnaudb@cumin1001"
14:17 samtar@deploy2002: samtar and anzx: Continuing with sync
14:16 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1138.eqiad.wmnet - arnaudb@cumin1001"
14:15 samtar@deploy2002: samtar and anzx: Backport for hewikivoyage: update vector 2022 wordmark and tagline (T351981) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:14 samtar@deploy2002: Started scap: Backport for hewikivoyage: update vector 2022 wordmark and tagline (T351981)
14:11 samtar@deploy2002: Finished scap: Backport for Enable read new on group0 wikis (T341829) (duration: 07m 57s)
14:05 samtar@deploy2002: samtar and dreamyjazz: Continuing with sync
14:05 samtar@deploy2002: samtar and dreamyjazz: Backport for Enable read new on group0 wikis (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:03 samtar@deploy2002: Started scap: Backport for Enable read new on group0 wikis (T341829)
13:59 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
13:58 arnaudb@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:56 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
13:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
13:27 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db1138.eqiad.wmnet
13:26 arnaudb@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:25 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'decommission db1138', diff saved to https://phabricator.wikimedia.org/P54328 and previous config saved to /var/cache/conftool/dbconfig/20231211-132250-arnaudb.json
13:20 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1138.eqiad.wmnet
13:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: decomission pre downtime
13:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: decomission pre downtime
13:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
13:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
13:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
13:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
12:57 claime: Rebuilding production-images for python3-build-bookworm - T352733
12:12 urbanecm@deploy2002: Finished scap: Backport for Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266) (duration: 08m 20s)
12:11 brouberol: Adding spark-history(-test).svc.eqiad.wmnet CNAMEs pointing to k8s-ingress-dse.svc.eqiad.wmnet. - T352639
12:05 urbanecm@deploy2002: urbanecm: Continuing with sync
12:05 urbanecm@deploy2002: urbanecm: Backport for Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:03 urbanecm@deploy2002: Started scap: Backport for Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266)
11:20 vgutierrez: rolling restart of pybal on lvs3010 and lvs3008 effectively enabling IPIP encapsulation on ncredir@esams - T351069
11:18 claime: sudo confctl --object-type discovery select 'name=eqiad,dnsdisc=k8s-ingress-dse' set/pooled=true - T352639
11:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
11:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
11:12 brouberol: Add discovery records for the k8s-ingress-dse LVS service - T352639
10:55 dcausse: (properly) restarting blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
10:54 cgoubert@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[1019-1020].eqiad.wmnet} and A:lvs (T352639)
10:50 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[1019-1020].eqiad.wmnet} and A:lvs (T352639)
10:46 claime: Running puppet on O:lvs::balancer - T352639
10:45 claime: Disabling puppet on O:lvs::balancer - T352639
10:42 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
10:42 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
10:42 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
10:38 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
10:38 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
10:38 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
10:37 claime: Repooling dse-k8s-worker nodes - sudo confctl select "service=kubesvc,cluster=dse-k8s" set/pooled=yes - T352639
10:03 jayme: removed cergen certs of all k8s servies from private puppet in commit d36a97a - T300033
09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38753
09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38753
09:55 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
09:55 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1547
09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1547
09:50 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
09:50 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
09:44 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
09:44 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
08:43 kostajh: UTC morning deploys done
08:43 kharlan@deploy2002: Finished scap: Backport for ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366), IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604) (duration: 22m 02s)
08:40 dcausse: restarted blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
08:36 kharlan@deploy2002: kharlan and d3r1ck01: Continuing with sync
08:22 kharlan@deploy2002: kharlan and d3r1ck01: Backport for ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366), IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:21 kharlan@deploy2002: Started scap: Backport for ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366), IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604)
08:16 kharlan@deploy2002: Finished scap: Backport for MediaModeration: Set MediaModerationDeveloperMode to false (duration: 09m 55s)
08:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: reboot for upgrade
08:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: reboot for upgrade
08:09 kharlan@deploy2002: kharlan: Continuing with sync
08:07 kharlan@deploy2002: kharlan: Backport for MediaModeration: Set MediaModerationDeveloperMode to false synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:06 kharlan@deploy2002: Started scap: Backport for MediaModeration: Set MediaModerationDeveloperMode to false
07:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: reboot for upgrade
07:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: reboot for upgrade
07:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
07:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
07:24 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
07:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade
07:12 marostegui: Failvoer m3-master from dbproxy1020 to dbproxy1026 T351864
07:12 marostegui: Failvoer m3-master from dbproxy1020 to dbproxy1026 org
06:44 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1" (duration: 08m 22s)
06:37 marostegui@deploy2002: marostegui: Continuing with sync
06:37 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:35 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1"
06:35 _joe_: update sirenbot to 0.3.7
06:34 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1011.eqiad.wmnet with OS bookworm
06:29 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
06:26 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
06:19 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
06:16 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
06:13 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: host reimage
06:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: host reimage
06:07 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
06:07 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
05:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host pc1011.eqiad.wmnet with OS bookworm
05:54 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc1 (T351787) (duration: 16m 54s)
05:47 marostegui@deploy2002: marostegui: Continuing with sync
05:46 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 (T351787) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
05:37 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc1 (T351787)
05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787
05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: pc1 master switch T351787

2023-12-09

15:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
15:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
15:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2005.codfw.wmnet with OS bullseye
01:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
00:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
00:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
00:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
00:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
00:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
00:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
00:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
00:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
00:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2005.codfw.wmnet with OS bullseye
00:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
00:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
00:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
00:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
00:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
00:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
00:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
00:01 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
00:01 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply

2023-12-08

23:49 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
23:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
23:48 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
23:48 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
23:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
23:47 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2003.codfw.wmnet with OS bullseye
23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
23:04 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:03 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:02 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
22:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2003.codfw.wmnet with OS bullseye
22:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
22:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
22:26 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:26 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2001.codfw.wmnet with OS bullseye
21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
21:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
21:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
20:02 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:02 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:27 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:27 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:26 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:26 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:09 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:08 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:49 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:49 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
16:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1015.eqiad.wmnet with reason: T347355
16:08 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:50 milimetric@deploy2002: Finished deploy [airflow-dags/platform_eng@049cf03]: (no justification provided) (duration: 00m 52s)
15:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
15:49 milimetric@deploy2002: Started deploy [airflow-dags/platform_eng@049cf03]: (no justification provided)
15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
15:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2006.codfw.wmnet with reason: host reimage
15:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2005.codfw.wmnet with reason: host reimage
15:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
15:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2004.codfw.wmnet with reason: host reimage
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
15:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
15:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
15:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
15:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
14:44 XioNoX: drain eqiad-codfw lumen transport for maintenance - T342502
14:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/echoserver: apply
14:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/echoserver: apply
14:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:55 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
12:55 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
12:42 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
12:42 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
11:40 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:40 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54322 and previous config saved to /var/cache/conftool/dbconfig/20231208-101337-arnaudb.json
09:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54321 and previous config saved to /var/cache/conftool/dbconfig/20231208-095830-arnaudb.json
09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54320 and previous config saved to /var/cache/conftool/dbconfig/20231208-094324-arnaudb.json
09:41 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:41 brouberol: Creating the echoserver namespace in dse-k8s-eqiad - T353004
09:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54319 and previous config saved to /var/cache/conftool/dbconfig/20231208-092817-arnaudb.json
09:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54318 and previous config saved to /var/cache/conftool/dbconfig/20231208-091628-arnaudb.json
09:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
09:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
07:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 237
07:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 237
06:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
06:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54317 and previous config saved to /var/cache/conftool/dbconfig/20231208-062636-ladsgroup.json
06:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P54316 and previous config saved to /var/cache/conftool/dbconfig/20231208-061130-ladsgroup.json
05:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P54315 and previous config saved to /var/cache/conftool/dbconfig/20231208-055623-ladsgroup.json
05:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54314 and previous config saved to /var/cache/conftool/dbconfig/20231208-054116-ladsgroup.json
05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1231 (T343198)', diff saved to https://phabricator.wikimedia.org/P54313 and previous config saved to /var/cache/conftool/dbconfig/20231208-050624-ladsgroup.json
05:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
05:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
04:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
04:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54312 and previous config saved to /var/cache/conftool/dbconfig/20231208-041826-ladsgroup.json
04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P54311 and previous config saved to /var/cache/conftool/dbconfig/20231208-040319-ladsgroup.json
03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P54310 and previous config saved to /var/cache/conftool/dbconfig/20231208-034813-ladsgroup.json
03:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54309 and previous config saved to /var/cache/conftool/dbconfig/20231208-033306-ladsgroup.json
03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P54308 and previous config saved to /var/cache/conftool/dbconfig/20231208-030005-ladsgroup.json
03:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
02:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54307 and previous config saved to /var/cache/conftool/dbconfig/20231208-025942-ladsgroup.json
02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P54306 and previous config saved to /var/cache/conftool/dbconfig/20231208-024435-ladsgroup.json
02:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P54305 and previous config saved to /var/cache/conftool/dbconfig/20231208-022929-ladsgroup.json
02:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
02:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
02:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
02:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
02:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2004']
02:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2004']
02:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
02:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54304 and previous config saved to /var/cache/conftool/dbconfig/20231208-021422-ladsgroup.json
02:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2004.codfw.wmnet with OS bullseye
01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P54303 and previous config saved to /var/cache/conftool/dbconfig/20231208-012115-ladsgroup.json
01:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
01:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
01:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54302 and previous config saved to /var/cache/conftool/dbconfig/20231208-012051-ladsgroup.json
01:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P54301 and previous config saved to /var/cache/conftool/dbconfig/20231208-010545-ladsgroup.json
00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P54300 and previous config saved to /var/cache/conftool/dbconfig/20231208-005038-ladsgroup.json
00:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1036.eqiad.wmnet with OS bullseye
00:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1037.eqiad.wmnet with OS bullseye
00:43 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1035.eqiad.wmnet with OS bullseye
00:38 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:37 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:36 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1038.eqiad.wmnet with OS bullseye
00:36 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54299 and previous config saved to /var/cache/conftool/dbconfig/20231208-003532-ladsgroup.json
00:35 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1036.eqiad.wmnet with reason: host reimage
00:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage
00:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage
00:19 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1038.eqiad.wmnet with reason: host reimage
00:16 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage
00:16 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1036.eqiad.wmnet with reason: host reimage
00:15 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage
00:15 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1038.eqiad.wmnet with reason: host reimage
00:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1038.eqiad.wmnet with OS bullseye
00:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1037.eqiad.wmnet with OS bullseye
00:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1036.eqiad.wmnet with OS bullseye
00:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1035.eqiad.wmnet with OS bullseye

2023-12-07

23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P54298 and previous config saved to /var/cache/conftool/dbconfig/20231207-235333-ladsgroup.json
23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P54297 and previous config saved to /var/cache/conftool/dbconfig/20231207-235310-ladsgroup.json
23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1061.eqiad.wmnet with OS bullseye
23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1062.eqiad.wmnet with OS bullseye
23:52 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1059.eqiad.wmnet with OS bullseye
23:52 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1060.eqiad.wmnet with OS bullseye
23:52 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:52 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:47 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:47 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P54296 and previous config saved to /var/cache/conftool/dbconfig/20231207-233802-ladsgroup.json
23:23 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:23 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:23 ryankemper@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P54295 and previous config saved to /var/cache/conftool/dbconfig/20231207-232256-ladsgroup.json
23:21 ryankemper@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
23:21 ryankemper@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
23:21 ryankemper@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
23:17 ryankemper@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:15 ryankemper@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P54294 and previous config saved to /var/cache/conftool/dbconfig/20231207-230749-ladsgroup.json
23:05 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
22:58 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
22:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp4037.ulsfo.wmnet
22:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
22:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
22:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
22:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
22:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
22:33 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
22:31 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1061.eqiad.wmnet with reason: host reimage
22:30 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1062.eqiad.wmnet with reason: host reimage
22:30 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1060.eqiad.wmnet with reason: host reimage
22:29 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1059.eqiad.wmnet with reason: host reimage
22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P54293 and previous config saved to /var/cache/conftool/dbconfig/20231207-222656-ladsgroup.json
22:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
22:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P54292 and previous config saved to /var/cache/conftool/dbconfig/20231207-222633-ladsgroup.json
22:22 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:22 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:20 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:20 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:19 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:19 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1062.eqiad.wmnet with OS bullseye
22:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1061.eqiad.wmnet with OS bullseye
22:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1060.eqiad.wmnet with OS bullseye
22:15 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1059.eqiad.wmnet with OS bullseye
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1061.mgmt.eqiad.wmnet with reboot policy FORCED
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1060.mgmt.eqiad.wmnet with reboot policy FORCED
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1062.mgmt.eqiad.wmnet with reboot policy FORCED
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1059.mgmt.eqiad.wmnet with reboot policy FORCED
22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P54291 and previous config saved to /var/cache/conftool/dbconfig/20231207-221127-ladsgroup.json
22:10 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:10 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1062.mgmt.eqiad.wmnet with reboot policy FORCED
21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1061.mgmt.eqiad.wmnet with reboot policy FORCED
21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1060.mgmt.eqiad.wmnet with reboot policy FORCED
21:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1059.mgmt.eqiad.wmnet with reboot policy FORCED
21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P54290 and previous config saved to /var/cache/conftool/dbconfig/20231207-215620-ladsgroup.json
21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P54289 and previous config saved to /var/cache/conftool/dbconfig/20231207-214114-ladsgroup.json
21:38 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@049cf03]: (no justification provided) (duration: 00m 28s)
21:37 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@049cf03]: (no justification provided)
21:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1082.eqiad.wmnet with OS bullseye
21:31 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
21:23 jdrewniak@deploy2002: Finished scap: Backport for Enable Vector beta feature for all wikis (T351339), [beta] ores-extension: enable revertrisk model for enwiki (T348298), Enable action blocks in Serbian Wikipedia (T351873) (duration: 09m 54s)
21:17 jdrewniak@deploy2002: zoranzoki21 and isaranto and jdlrobson and jdrewniak: Continuing with sync
21:15 jdrewniak@deploy2002: zoranzoki21 and isaranto and jdlrobson and jdrewniak: Backport for Enable Vector beta feature for all wikis (T351339), [beta] ores-extension: enable revertrisk model for enwiki (T348298), Enable action blocks in Serbian Wikipedia (T351873) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:13 jdrewniak@deploy2002: Started scap: Backport for Enable Vector beta feature for all wikis (T351339), [beta] ores-extension: enable revertrisk model for enwiki (T348298), Enable action blocks in Serbian Wikipedia (T351873)
21:06 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Config: Remove eventlogging_FeaturePolicyViolation and _SpecialMuteSubmit EventStreamConfig (T329718) (duration: 09m 16s)
21:02 dcausse: restarting blazegraph on wdqs2017 (BlazegraphFreeAllocatorsDecreasingRapidly)
20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P54288 and previous config saved to /var/cache/conftool/dbconfig/20231207-205817-ladsgroup.json
20:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
20:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P54287 and previous config saved to /var/cache/conftool/dbconfig/20231207-205753-ladsgroup.json
20:56 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: Config: Remove eventlogging_FeaturePolicyViolation and _SpecialMuteSubmit EventLogging config (T329718) (duration: 07m 07s)
20:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P54286 and previous config saved to /var/cache/conftool/dbconfig/20231207-204247-ladsgroup.json
20:30 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P54285 and previous config saved to /var/cache/conftool/dbconfig/20231207-202740-ladsgroup.json
20:27 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P54283 and previous config saved to /var/cache/conftool/dbconfig/20231207-201234-ladsgroup.json
20:06 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1082.eqiad.wmnet with reason: host reimage
20:05 urandom: bootstrap Cassandra/restbase2030-a — T352468
20:02 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1082.eqiad.wmnet with reason: host reimage
20:01 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:01 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:59 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:59 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
19:38 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:38 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:35 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: graph split experiments T350106
19:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: graph split experiments T350106
19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P54282 and previous config saved to /var/cache/conftool/dbconfig/20231207-192949-ladsgroup.json
19:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
19:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P54281 and previous config saved to /var/cache/conftool/dbconfig/20231207-192926-ladsgroup.json
19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P54280 and previous config saved to /var/cache/conftool/dbconfig/20231207-191420-ladsgroup.json
18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P54279 and previous config saved to /var/cache/conftool/dbconfig/20231207-185913-ladsgroup.json
18:45 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:45 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P54278 and previous config saved to /var/cache/conftool/dbconfig/20231207-184406-ladsgroup.json
18:42 mutante: puppetmaster1001 - revoke cert for miscweb.discovery.wmnet
18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P54277 and previous config saved to /var/cache/conftool/dbconfig/20231207-180427-ladsgroup.json
18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
18:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
17:58 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs1024.eqiad.wmnet
17:57 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs1024.eqiad.wmnet
17:40 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
17:40 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
17:39 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
17:38 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
17:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
17:09 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:09 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup logstash/kibana records T299700 - herron@cumin1001"
17:08 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup logstash/kibana records T299700 - herron@cumin1001"
17:05 herron@cumin1001: START - Cookbook sre.dns.netbox
16:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
16:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
16:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
16:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
16:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
16:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
16:38 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
16:27 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
16:27 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
16:26 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
16:26 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
16:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
16:24 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
16:24 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
16:23 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
16:09 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
16:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
16:02 sukhe: run dummy authdns-update on dns6001
16:00 milimetric@deploy2002: Finished deploy [analytics/refinery@8b8f178] (thin): hotfix: sqoop (duration: 00m 07s)
16:00 milimetric@deploy2002: Started deploy [analytics/refinery@8b8f178] (thin): hotfix: sqoop
15:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54274 and previous config saved to /var/cache/conftool/dbconfig/20231207-155712-arnaudb.json
15:55 milimetric@deploy2002: Finished deploy [analytics/refinery@8b8f178]: hotfix: sqoop (duration: 10m 08s)
15:53 sukhe: running authdns-update with broken resolv.conf on dns6001
15:48 sukhe: clear out dns6001 resolv.conf to check for SSH config-based authdns-update
15:45 milimetric@deploy2002: Started deploy [analytics/refinery@8b8f178]: hotfix: sqoop
15:45 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
15:44 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
15:44 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
15:44 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
15:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54273 and previous config saved to /var/cache/conftool/dbconfig/20231207-154205-arnaudb.json
15:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2006.codfw.wmnet with OS bullseye
15:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2005.codfw.wmnet with OS bullseye
15:29 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
15:28 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
15:28 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
15:27 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
15:27 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
15:27 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
15:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54272 and previous config saved to /var/cache/conftool/dbconfig/20231207-152659-arnaudb.json
15:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
15:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cp4037.ulsfo.wmnet
15:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54271 and previous config saved to /var/cache/conftool/dbconfig/20231207-151152-arnaudb.json
15:08 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
15:08 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
15:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54270 and previous config saved to /var/cache/conftool/dbconfig/20231207-150750-arnaudb.json
15:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
15:07 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
15:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
15:07 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
15:06 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
15:06 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
15:04 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
15:03 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
15:02 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
15:01 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
15:01 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
15:00 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
14:53 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
14:53 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
14:53 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
14:53 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
14:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
14:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2002.codfw.wmnet with OS bullseye
14:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
14:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2006.codfw.wmnet with OS bullseye
14:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2005.codfw.wmnet with OS bullseye
14:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2004.codfw.wmnet with OS bullseye
14:41 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
14:32 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
14:31 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:30 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
14:29 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
14:26 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
14:26 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
14:26 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
13:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
13:49 ladsgroup@deploy2002: Finished scap: Backport for api: Only force backlink namespace index when there is one ns only (T351237) (duration: 10m 55s)
13:42 ladsgroup@deploy2002: jforrester and ladsgroup: Continuing with sync
13:40 ladsgroup@deploy2002: jforrester and ladsgroup: Backport for api: Only force backlink namespace index when there is one ns only (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:38 ladsgroup@deploy2002: Started scap: Backport for api: Only force backlink namespace index when there is one ns only (T351237)
13:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:34 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:34 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:33 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:32 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:32 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:31 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:31 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:27 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
13:27 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
13:25 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
13:25 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:25 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: sync
13:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:19 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:18 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:10 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
13:09 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
13:09 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
13:09 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
13:09 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
13:08 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
13:07 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
13:07 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
12:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
12:52 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
12:48 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:48 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:47 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:47 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1001.eqiad.wmnet
12:18 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
12:18 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
12:17 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
12:17 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
12:17 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
12:16 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:13 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1001.eqiad.wmnet
11:51 btullis@deploy2002: Finished deploy [analytics/refinery@b6499b1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b6499b17] (duration: 03m 17s)
11:48 btullis@deploy2002: Started deploy [analytics/refinery@b6499b1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b6499b17]
11:33 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
11:33 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
11:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
11:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
11:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
11:17 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
11:17 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
11:14 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
11:14 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
11:13 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
11:13 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
11:12 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
11:10 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
11:10 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
11:01 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
10:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
10:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
10:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: cluster::management
10:53 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
10:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
10:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
10:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
10:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
10:45 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: cluster::management
10:38 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
10:38 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
10:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
10:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
10:34 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:34 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:33 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
10:33 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
10:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
10:32 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
10:27 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
10:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
10:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
10:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
10:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
09:42 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
09:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
09:41 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
09:40 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
09:40 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
09:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 31 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP nftables
08:52 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 31 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP nftables
08:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
08:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1119.eqiad.wmnet
06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1119.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
06:52 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1119.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
06:50 marostegui@cumin1001: START - Cookbook sre.dns.netbox
06:44 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1119.eqiad.wmnet
06:35 marostegui: Failover m5-master from dbproxy1021 to dbproxy1027 T351864
00:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
00:53 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1081.eqiad.wmnet with OS bullseye
00:53 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1080.eqiad.wmnet with OS bullseye
00:53 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:53 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"

2023-12-06

23:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1082.eqiad.wmnet with OS bullseye
23:47 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1081.eqiad.wmnet with reason: host reimage
23:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1080.eqiad.wmnet with reason: host reimage
23:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1081.eqiad.wmnet with reason: host reimage
23:19 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1080.eqiad.wmnet with reason: host reimage
23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 34 days, 0:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
23:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 34 days, 0:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
22:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1081.eqiad.wmnet with OS bullseye
22:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
22:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1080.eqiad.wmnet with OS bullseye
22:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1080']
22:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1082']
22:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1081']
22:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1080']
22:43 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1081']
22:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1081']
22:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1080']
22:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1082']
22:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1081']
22:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1080']
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
22:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1080.mgmt.eqiad.wmnet with reboot policy FORCED
22:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1081.mgmt.eqiad.wmnet with reboot policy FORCED
21:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
21:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1080.mgmt.eqiad.wmnet with reboot policy FORCED
21:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1081.mgmt.eqiad.wmnet with reboot policy FORCED
21:51 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
21:50 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
21:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
21:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
21:43 urbanecm@deploy2002: Finished scap: Backport for Correct links to beta feature (T352826), Beta Features: Allow Vector 2022 typography feature (T351339) (duration: 10m 51s)
21:36 urbanecm@deploy2002: urbanecm and jdlrobson: Continuing with sync
21:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1082.mgmt.eqiad.wmnet with reboot policy FORCED
21:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:35 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
21:34 urbanecm@deploy2002: urbanecm and jdlrobson: Backport for Correct links to beta feature (T352826), Beta Features: Allow Vector 2022 typography feature (T351339) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:34 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
21:33 urbanecm@deploy2002: Started scap: Backport for Correct links to beta feature (T352826), Beta Features: Allow Vector 2022 typography feature (T351339)
21:32 jclark@cumin1001: START - Cookbook sre.dns.netbox
21:31 urbanecm@deploy2002: Finished scap: Backport for DiscussionTools: Rename config (duration: 10m 01s)
21:25 urbanecm@deploy2002: esanders and urbanecm: Continuing with sync
21:22 urbanecm@deploy2002: esanders and urbanecm: Backport for DiscussionTools: Rename config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:21 urbanecm@deploy2002: Started scap: Backport for DiscussionTools: Rename config
21:20 urbanecm@deploy2002: Finished scap: Backport for Enable DT visual enhancements on pages with (T352232) (duration: 10m 43s)
21:13 urbanecm@deploy2002: urbanecm and esanders: Continuing with sync
21:11 urbanecm@deploy2002: urbanecm and esanders: Backport for Enable DT visual enhancements on pages with (T352232) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:09 urbanecm@deploy2002: Started scap: Backport for Enable DT visual enhancements on pages with (T352232)
20:55 ejegg: fundraising civicrm upgraded from 6ca683b2 to 8c107215
19:07 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host wdqs1024.eqiad.wmnet
18:55 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1024.eqiad.wmnet
18:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
18:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on wdqs1024.eqiad.wmnet with reason: T352878
18:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2001.codfw.wmnet with OS bullseye
18:02 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
17:47 ejegg: standalone SmashPig upgraded from 83d509ed to fc74ccca
17:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
17:34 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
17:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp4037.ulsfo.wmnet
17:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4037.ulsfo.wmnet with OS bullseye
17:06 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cp4037.ulsfo.wmnet
17:05 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp4037.ulsfo.wmnet
16:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
16:51 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
16:41 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
16:40 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
16:40 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
16:40 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
16:29 urandom: bootstrapping Cassandra/restbase2020-a — T352468
16:07 milimetric@deploy2002: Finished deploy [airflow-dags/platform_eng@db1cb48]: in order to run the querypage job (duration: 01m 28s)
16:05 milimetric@deploy2002: Started deploy [airflow-dags/platform_eng@db1cb48]: in order to run the querypage job
15:56 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
15:56 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
15:52 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:51 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
15:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
15:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
15:47 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:46 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
15:46 urandom: restarting Cassandra on aqs2001-{a,b,c} (testing puppet 7 migration)
15:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: sessionstore
15:39 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:39 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
15:38 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
15:38 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:37 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: sessionstore
15:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd2001.codfw.wmnet with OS bullseye
15:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
15:30 jforrester@deploy2002: Finished scap: Backport for Beta Features: Move ULS Compact Links to only the wikis it's enabled on, Beta Features: Drop Popups, deployed everywhere for ages (duration: 11m 33s)
15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
15:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
15:28 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:28 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
15:28 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
15:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: restbase::production
15:23 sukhe: depool cp4037 for reimage testing: T350179
15:23 jforrester@deploy2002: jforrester: Continuing with sync
15:21 jforrester@deploy2002: jforrester: Backport for Beta Features: Move ULS Compact Links to only the wikis it's enabled on, Beta Features: Drop Popups, deployed everywhere for ages synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2003.mgmt.codfw.wmnet with reboot policy FORCED
15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2002.mgmt.codfw.wmnet with reboot policy FORCED
15:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cephosd2001.mgmt.codfw.wmnet with reboot policy FORCED
15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['testhost2001']
15:19 jforrester@deploy2002: Started scap: Backport for Beta Features: Move ULS Compact Links to only the wikis it's enabled on, Beta Features: Drop Popups, deployed everywhere for ages
15:14 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001']
15:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: restbase::production
15:02 moritzm: installing mariadb bugfix updates from Bookworm point release (as packaged in Debian, unrelated to wmf-mariadb packages)
14:43 moritzm: installing debian-archive-keyring updates from Bookworm point release
14:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dnsbox
14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dnsbox
14:21 fabfur: repooling cp4052 after reimage (bookworm -> bullseye) due to possible impacting T352744
13:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
13:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
13:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4052.ulsfo.wmnet
13:45 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
13:45 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
13:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1001.eqiad.wmnet with OS bookworm
13:37 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
13:20 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4052.ulsfo.wmnet
13:12 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@bfd944e]: Add metrics configuration TEST [airflow-dags@bfd944e4] (duration: 00m 11s)
13:12 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@bfd944e]: Add metrics configuration TEST [airflow-dags@bfd944e4]
13:08 moritzm: installing systemd bugfix updates from Bookworm point release
12:52 mvernon@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
12:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4041.ulsfo.wmnet
12:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4041.ulsfo.wmnet
12:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
12:33 moritzm: installing pam bugfix updates from Bookworm point release
12:30 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
12:15 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1001.eqiad.wmnet with OS bookworm
11:48 hnowlan: rollback changeprop-jobqueue
11:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: druid::analytics::worker
11:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
11:42 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
11:41 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
11:40 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
11:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: druid::analytics::worker
11:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4044.ulsfo.wmnet
11:16 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4044.ulsfo.wmnet
10:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4050.ulsfo.wmnet
10:38 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4050.ulsfo.wmnet
10:26 moritzm: installing gtk+3.0 bug fix updates from Bookworm point release
08:49 godog: test rsyslog version from bullseye-backports on centrallog - T351710
08:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54264 and previous config saved to /var/cache/conftool/dbconfig/20231206-084928-arnaudb.json
08:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P54263 and previous config saved to /var/cache/conftool/dbconfig/20231206-083422-arnaudb.json
08:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P54262 and previous config saved to /var/cache/conftool/dbconfig/20231206-081915-arnaudb.json
08:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4047.ulsfo.wmnet
08:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54261 and previous config saved to /var/cache/conftool/dbconfig/20231206-080409-arnaudb.json
07:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4047.ulsfo.wmnet
07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54260 and previous config saved to /var/cache/conftool/dbconfig/20231206-075333-arnaudb.json
07:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
07:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T348183)', diff saved to https://phabricator.wikimedia.org/P54259 and previous config saved to /var/cache/conftool/dbconfig/20231206-075309-arnaudb.json
07:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P54258 and previous config saved to /var/cache/conftool/dbconfig/20231206-073803-arnaudb.json
07:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P54257 and previous config saved to /var/cache/conftool/dbconfig/20231206-072256-arnaudb.json
07:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T348183)', diff saved to https://phabricator.wikimedia.org/P54256 and previous config saved to /var/cache/conftool/dbconfig/20231206-070749-arnaudb.json
06:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T348183)', diff saved to https://phabricator.wikimedia.org/P54255 and previous config saved to /var/cache/conftool/dbconfig/20231206-062922-arnaudb.json
06:29 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
06:29 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
06:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T348183)', diff saved to https://phabricator.wikimedia.org/P54254 and previous config saved to /var/cache/conftool/dbconfig/20231206-062859-arnaudb.json
06:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P54252 and previous config saved to /var/cache/conftool/dbconfig/20231206-061352-arnaudb.json
05:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P54251 and previous config saved to /var/cache/conftool/dbconfig/20231206-055846-arnaudb.json
05:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T348183)', diff saved to https://phabricator.wikimedia.org/P54250 and previous config saved to /var/cache/conftool/dbconfig/20231206-054339-arnaudb.json
05:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T348183)', diff saved to https://phabricator.wikimedia.org/P54249 and previous config saved to /var/cache/conftool/dbconfig/20231206-053321-arnaudb.json
05:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
05:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
05:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T348183)', diff saved to https://phabricator.wikimedia.org/P54248 and previous config saved to /var/cache/conftool/dbconfig/20231206-053256-arnaudb.json
05:19 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade T351616 (duration: 00m 09s)
05:19 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade T351616
05:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P54247 and previous config saved to /var/cache/conftool/dbconfig/20231206-051750-arnaudb.json
05:09 ejegg: fundraising civicrm upgraded from 6bb8a67f to 6ca683b2
05:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P54246 and previous config saved to /var/cache/conftool/dbconfig/20231206-050243-arnaudb.json
04:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T348183)', diff saved to https://phabricator.wikimedia.org/P54245 and previous config saved to /var/cache/conftool/dbconfig/20231206-044737-arnaudb.json
04:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T348183)', diff saved to https://phabricator.wikimedia.org/P54244 and previous config saved to /var/cache/conftool/dbconfig/20231206-043718-arnaudb.json
04:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
04:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
04:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
04:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
04:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54243 and previous config saved to /var/cache/conftool/dbconfig/20231206-043638-arnaudb.json
04:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P54242 and previous config saved to /var/cache/conftool/dbconfig/20231206-042132-arnaudb.json
04:14 ejegg: standalone (payments listener) SmashPig upgraded from f24afba3 to 83d509ed
04:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P54241 and previous config saved to /var/cache/conftool/dbconfig/20231206-040625-arnaudb.json
03:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54240 and previous config saved to /var/cache/conftool/dbconfig/20231206-035119-arnaudb.json
03:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54239 and previous config saved to /var/cache/conftool/dbconfig/20231206-034045-arnaudb.json
03:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
03:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
03:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54238 and previous config saved to /var/cache/conftool/dbconfig/20231206-034022-arnaudb.json
03:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P54237 and previous config saved to /var/cache/conftool/dbconfig/20231206-032516-arnaudb.json
03:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P54236 and previous config saved to /var/cache/conftool/dbconfig/20231206-031009-arnaudb.json
02:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54235 and previous config saved to /var/cache/conftool/dbconfig/20231206-025503-arnaudb.json
02:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T348183)', diff saved to https://phabricator.wikimedia.org/P54234 and previous config saved to /var/cache/conftool/dbconfig/20231206-024108-arnaudb.json
02:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
02:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
02:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T348183)', diff saved to https://phabricator.wikimedia.org/P54233 and previous config saved to /var/cache/conftool/dbconfig/20231206-024045-arnaudb.json
02:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P54232 and previous config saved to /var/cache/conftool/dbconfig/20231206-022538-arnaudb.json
02:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P54231 and previous config saved to /var/cache/conftool/dbconfig/20231206-021031-arnaudb.json
02:08 eileen: civicrm upgraded from 7fb98ee8 to 6bb8a67f
02:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
02:00 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T348183)', diff saved to https://phabricator.wikimedia.org/P54230 and previous config saved to /var/cache/conftool/dbconfig/20231206-015519-arnaudb.json
01:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:51 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T348183)', diff saved to https://phabricator.wikimedia.org/P54229 and previous config saved to /var/cache/conftool/dbconfig/20231206-014506-arnaudb.json
01:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
01:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
01:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T348183)', diff saved to https://phabricator.wikimedia.org/P54228 and previous config saved to /var/cache/conftool/dbconfig/20231206-014443-arnaudb.json
01:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:40 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
01:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
01:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P54227 and previous config saved to /var/cache/conftool/dbconfig/20231206-012936-arnaudb.json
01:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2003.codfw.wmnet with OS bullseye
01:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bullseye
01:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bullseye
01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
01:21 eileen: civicrm upgraded from d8238788 to 7fb98ee8
01:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:20 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P54226 and previous config saved to /var/cache/conftool/dbconfig/20231206-011430-arnaudb.json
01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating ceph to cephosd to codfw - jhancock@cumin2002"
01:03 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
00:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T348183)', diff saved to https://phabricator.wikimedia.org/P54225 and previous config saved to /var/cache/conftool/dbconfig/20231206-005923-arnaudb.json
00:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T348183)', diff saved to https://phabricator.wikimedia.org/P54224 and previous config saved to /var/cache/conftool/dbconfig/20231206-004820-arnaudb.json
00:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
00:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
00:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T348183)', diff saved to https://phabricator.wikimedia.org/P54223 and previous config saved to /var/cache/conftool/dbconfig/20231206-004756-arnaudb.json
00:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P54222 and previous config saved to /var/cache/conftool/dbconfig/20231206-003249-arnaudb.json
00:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P54221 and previous config saved to /var/cache/conftool/dbconfig/20231206-001742-arnaudb.json
00:17 ejegg: civicrm upgraded from 297a091d to d8238788
00:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T348183)', diff saved to https://phabricator.wikimedia.org/P54220 and previous config saved to /var/cache/conftool/dbconfig/20231206-000236-arnaudb.json

2023-12-05

23:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T348183)', diff saved to https://phabricator.wikimedia.org/P54219 and previous config saved to /var/cache/conftool/dbconfig/20231205-235213-arnaudb.json
23:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
23:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
23:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
23:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
23:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T348183)', diff saved to https://phabricator.wikimedia.org/P54218 and previous config saved to /var/cache/conftool/dbconfig/20231205-234425-arnaudb.json
23:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P54217 and previous config saved to /var/cache/conftool/dbconfig/20231205-232918-arnaudb.json
23:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P54216 and previous config saved to /var/cache/conftool/dbconfig/20231205-231412-arnaudb.json
22:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T348183)', diff saved to https://phabricator.wikimedia.org/P54215 and previous config saved to /var/cache/conftool/dbconfig/20231205-225905-arnaudb.json
22:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T348183)', diff saved to https://phabricator.wikimedia.org/P54214 and previous config saved to /var/cache/conftool/dbconfig/20231205-224838-arnaudb.json
22:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
22:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
22:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T348183)', diff saved to https://phabricator.wikimedia.org/P54213 and previous config saved to /var/cache/conftool/dbconfig/20231205-224816-arnaudb.json
22:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P54212 and previous config saved to /var/cache/conftool/dbconfig/20231205-223309-arnaudb.json
22:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P54211 and previous config saved to /var/cache/conftool/dbconfig/20231205-221803-arnaudb.json
22:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T348183)', diff saved to https://phabricator.wikimedia.org/P54210 and previous config saved to /var/cache/conftool/dbconfig/20231205-220256-arnaudb.json
22:01 jforrester@deploy2002: Finished scap: Backport for Define the corresponding stream for scroll (T350883), Add stream config for *webuiactions via Metrics Platform (T351298) (duration: 19m 01s)
21:53 jforrester@deploy2002: ksarabia and jforrester and cjming: Continuing with sync
21:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T348183)', diff saved to https://phabricator.wikimedia.org/P54209 and previous config saved to /var/cache/conftool/dbconfig/20231205-215135-arnaudb.json
21:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
21:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
21:43 jforrester@deploy2002: ksarabia and jforrester and cjming: Backport for Define the corresponding stream for scroll (T350883), Add stream config for *webuiactions via Metrics Platform (T351298) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
21:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
21:42 jforrester@deploy2002: Started scap: Backport for Define the corresponding stream for scroll (T350883), Add stream config for *webuiactions via Metrics Platform (T351298)
21:40 jforrester@deploy2002: Finished scap: Backport for [Zebra] Make .vector-column-start cache compatible (T347712 T351830), Fix nonzebra sticky container scrolling behavior and scrollable indicator (T352464) (duration: 12m 50s)
21:35 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
21:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
21:34 jforrester@deploy2002: jdlrobson and jforrester and jdrewniak: Continuing with sync
21:30 jforrester@deploy2002: jdlrobson and jforrester and jdrewniak: Backport for [Zebra] Make .vector-column-start cache compatible (T347712 T351830), Fix nonzebra sticky container scrolling behavior and scrollable indicator (T352464) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:27 jforrester@deploy2002: Started scap: Backport for [Zebra] Make .vector-column-start cache compatible (T347712 T351830), Fix nonzebra sticky container scrolling behavior and scrollable indicator (T352464)
21:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
21:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
21:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T348183)', diff saved to https://phabricator.wikimedia.org/P54208 and previous config saved to /var/cache/conftool/dbconfig/20231205-212707-arnaudb.json
21:27 jforrester@deploy2002: Finished scap: Backport for Deploy VectorClientPreferences to beta on pl,fr,ca,fa,tr wikis (T351339) (duration: 13m 44s)
21:19 jforrester@deploy2002: bwang and jforrester: Continuing with sync
21:13 jforrester@deploy2002: Started scap: Backport for Deploy VectorClientPreferences to beta on pl,fr,ca,fa,tr wikis (T351339)
21:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P54207 and previous config saved to /var/cache/conftool/dbconfig/20231205-211200-arnaudb.json
21:11 jforrester@deploy2002: Finished scap: Backport for Revert "Do not try to use Thumbor on beta" (T344605), nlwikivoyage: Drop Listings extension (T352696), Drop Listings extension from Wikivoyages where unused (T352719) (duration: 08m 45s)
21:04 jforrester@deploy2002: tgr and jforrester: Continuing with sync
21:04 jforrester@deploy2002: tgr and jforrester: Backport for Revert "Do not try to use Thumbor on beta" (T344605), nlwikivoyage: Drop Listings extension (T352696), Drop Listings extension from Wikivoyages where unused (T352719) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:02 jforrester@deploy2002: Started scap: Backport for Revert "Do not try to use Thumbor on beta" (T344605), nlwikivoyage: Drop Listings extension (T352696), Drop Listings extension from Wikivoyages where unused (T352719)
20:58 inflatador: bking@prometheus1006 disable puppet for troubleshooting T347355
20:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P54206 and previous config saved to /var/cache/conftool/dbconfig/20231205-205654-arnaudb.json
20:53 inflatador: bking@prometheus1006 reload prometheus-blackbox service T347355
20:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T348183)', diff saved to https://phabricator.wikimedia.org/P54205 and previous config saved to /var/cache/conftool/dbconfig/20231205-204147-arnaudb.json
20:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T348183)', diff saved to https://phabricator.wikimedia.org/P54204 and previous config saved to /var/cache/conftool/dbconfig/20231205-203158-arnaudb.json
20:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
20:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
20:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T348183)', diff saved to https://phabricator.wikimedia.org/P54203 and previous config saved to /var/cache/conftool/dbconfig/20231205-203136-arnaudb.json
20:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P54202 and previous config saved to /var/cache/conftool/dbconfig/20231205-201629-arnaudb.json
20:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P54201 and previous config saved to /var/cache/conftool/dbconfig/20231205-200123-arnaudb.json
19:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T348183)', diff saved to https://phabricator.wikimedia.org/P54200 and previous config saved to /var/cache/conftool/dbconfig/20231205-194616-arnaudb.json
19:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1218 (T348183)', diff saved to https://phabricator.wikimedia.org/P54199 and previous config saved to /var/cache/conftool/dbconfig/20231205-193627-arnaudb.json
19:36 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
19:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
19:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T348183)', diff saved to https://phabricator.wikimedia.org/P54198 and previous config saved to /var/cache/conftool/dbconfig/20231205-193604-arnaudb.json
19:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P54197 and previous config saved to /var/cache/conftool/dbconfig/20231205-192057-arnaudb.json
19:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P54196 and previous config saved to /var/cache/conftool/dbconfig/20231205-190551-arnaudb.json
18:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T348183)', diff saved to https://phabricator.wikimedia.org/P54195 and previous config saved to /var/cache/conftool/dbconfig/20231205-185044-arnaudb.json
18:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T348183)', diff saved to https://phabricator.wikimedia.org/P54194 and previous config saved to /var/cache/conftool/dbconfig/20231205-184108-arnaudb.json
18:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
18:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
18:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54193 and previous config saved to /var/cache/conftool/dbconfig/20231205-184045-arnaudb.json
18:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54192 and previous config saved to /var/cache/conftool/dbconfig/20231205-182539-arnaudb.json
18:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bullseye
18:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P54191 and previous config saved to /var/cache/conftool/dbconfig/20231205-181032-arnaudb.json
17:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54190 and previous config saved to /var/cache/conftool/dbconfig/20231205-175526-arnaudb.json
17:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
17:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
17:46 vgutierrez: rolling restart of text|secondary LVS on drmrs effectively enabling IPIP encapsulation for ncredir@drmrs- T351069
17:29 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
17:29 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
17:29 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
17:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
17:28 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
17:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['testhost2001']
17:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001']
17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['testhost2001']
17:11 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bullseye
17:00 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1002.eqiad.wmnet with OS bookworm
16:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T348183)', diff saved to https://phabricator.wikimedia.org/P54189 and previous config saved to /var/cache/conftool/dbconfig/20231205-165503-arnaudb.json
16:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
16:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
16:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T348183)', diff saved to https://phabricator.wikimedia.org/P54188 and previous config saved to /var/cache/conftool/dbconfig/20231205-165439-arnaudb.json
16:52 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
16:52 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bullseye
16:47 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
16:42 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['testhost2001']
16:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host testhost2001.mgmt.codfw.wmnet with reboot policy FORCED
16:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P54187 and previous config saved to /var/cache/conftool/dbconfig/20231205-163933-arnaudb.json
16:37 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
16:34 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
16:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P54186 and previous config saved to /var/cache/conftool/dbconfig/20231205-162426-arnaudb.json
16:24 claime: Rolling back k8s-ingress-dse - restarting pybal on lvs1019 - T352639
16:18 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
16:18 claime: Rolling back k8s-ingress-dse - restarting pybal on lvs1020 - T352639
16:18 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
16:18 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
16:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
16:14 samtar@deploy2002: Finished scap: Backport for .well-known: Add F-Droid signature to assetlinks.json (T346951) (duration: 07m 53s)
16:11 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
16:09 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
16:09 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
16:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T348183)', diff saved to https://phabricator.wikimedia.org/P54185 and previous config saved to /var/cache/conftool/dbconfig/20231205-160920-arnaudb.json
16:09 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
16:08 samtar@deploy2002: samtar: Continuing with sync
16:08 samtar@deploy2002: samtar: Backport for .well-known: Add F-Droid signature to assetlinks.json (T346951) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:07 samtar@deploy2002: Started scap: Backport for .well-known: Add F-Droid signature to assetlinks.json (T346951)
16:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host testhost2001.mgmt.codfw.wmnet with reboot policy FORCED
16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding testhost2001 to codfw - jhancock@cumin2002"
15:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding testhost2001 to codfw - jhancock@cumin2002"
15:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T348183)', diff saved to https://phabricator.wikimedia.org/P54184 and previous config saved to /var/cache/conftool/dbconfig/20231205-155858-arnaudb.json
15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
15:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
15:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T348183)', diff saved to https://phabricator.wikimedia.org/P54183 and previous config saved to /var/cache/conftool/dbconfig/20231205-155814-arnaudb.json
15:57 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:56 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
15:56 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
15:56 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
15:56 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
15:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4040.ulsfo.wmnet
15:49 claime: sudo confctl select "service=kubesvc,cluster=dse-k8s" set/pooled=inactive - T352639
15:45 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4040.ulsfo.wmnet
15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P54182 and previous config saved to /var/cache/conftool/dbconfig/20231205-154308-arnaudb.json
15:42 moritzm: installing monitoring-plugins bugfix updates from Bookworm point release
15:42 claime: Manually restarting pybal on lvs1020 - T352639
15:39 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
15:31 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1471.eqiad.wmnet with OS bullseye
15:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2005']
15:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2005']
15:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
15:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sessionstore2006']
15:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2006']
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
15:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P54181 and previous config saved to /var/cache/conftool/dbconfig/20231205-152801-arnaudb.json
15:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host aqs2001.codfw.wmnet
15:22 claime: Manually restarting pybal on lvs1019 - T352639
15:21 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:20 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
15:18 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:16 claime: Manually restarting pybal on lvs1020 - T352639
15:15 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host aqs2001.codfw.wmnet
15:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
15:13 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1471.eqiad.wmnet with reason: host reimage
15:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T348183)', diff saved to https://phabricator.wikimedia.org/P54180 and previous config saved to /var/cache/conftool/dbconfig/20231205-151255-arnaudb.json
15:12 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
15:11 cgoubert@cumin1001: END (FAIL) - Cookbook sre.loadbalancer.restart-pybal (exit_code=1) rolling-restart of pybal on P{lvs[1018,1020].eqiad.wmnet} and A:lvs (T352639)
15:11 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
15:10 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1471.eqiad.wmnet with reason: host reimage
15:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
15:06 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
15:06 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
15:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sessionstore2006.mgmt.codfw.wmnet with reboot policy FORCED
15:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4043.ulsfo.wmnet
15:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T348183)', diff saved to https://phabricator.wikimedia.org/P54179 and previous config saved to /var/cache/conftool/dbconfig/20231205-150243-arnaudb.json
15:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
15:02 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
15:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T348183)', diff saved to https://phabricator.wikimedia.org/P54178 and previous config saved to /var/cache/conftool/dbconfig/20231205-150220-arnaudb.json
15:01 cgoubert@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[1018,1020].eqiad.wmnet} and A:lvs (T352639)
14:58 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
14:58 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1471.eqiad.wmnet with OS bullseye
14:57 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
14:57 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
14:57 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2006.mgmt.codfw.wmnet with reboot policy FORCED
14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2005.mgmt.codfw.wmnet with reboot policy FORCED
14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sessionstore2004.mgmt.codfw.wmnet with reboot policy FORCED
14:54 brouberol: adding k8s-ingress-dse backend to LVS - T352639
14:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4043.ulsfo.wmnet
14:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P54177 and previous config saved to /var/cache/conftool/dbconfig/20231205-144714-arnaudb.json
14:45 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
14:45 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
14:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sessionstore2004-6 to codfw - jhancock@cumin2002"
14:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sessionstore2004-6 to codfw - jhancock@cumin2002"
14:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:41 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:41 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:40 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:40 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: redis::misc::master
14:38 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2002']
14:35 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:32 urbanecm@deploy2002: Finished scap: Backport for User impact: update quantizeViews to process small series of view data (T352349), Add maintenance script to import existing files to scan table (T350863), Only allow drawing and bitmap media types to be scanned (T352234) (duration: 08m 55s)
14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P54176 and previous config saved to /var/cache/conftool/dbconfig/20231205-143207-arnaudb.json
14:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: redis::misc::master
14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
14:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
14:26 urbanecm@deploy2002: kharlan and urbanecm: Continuing with sync
14:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
14:25 urbanecm@deploy2002: kharlan and urbanecm: Backport for User impact: update quantizeViews to process small series of view data (T352349), Add maintenance script to import existing files to scan table (T350863), Only allow drawing and bitmap media types to be scanned (T352234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
14:23 urbanecm@deploy2002: Started scap: Backport for User impact: update quantizeViews to process small series of view data (T352349), Add maintenance script to import existing files to scan table (T350863), Only allow drawing and bitmap media types to be scanned (T352234)
14:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T348183)', diff saved to https://phabricator.wikimedia.org/P54175 and previous config saved to /var/cache/conftool/dbconfig/20231205-141701-arnaudb.json
14:13 urbanecm@deploy2002: Finished scap: Backport for Growth: Enable Welcome survey user research for ar/en/es (T351266) (duration: 09m 33s)
14:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T348183)', diff saved to https://phabricator.wikimedia.org/P54174 and previous config saved to /var/cache/conftool/dbconfig/20231205-140742-arnaudb.json
14:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
14:07 urbanecm@deploy2002: urbanecm: Continuing with sync
14:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
14:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T348183)', diff saved to https://phabricator.wikimedia.org/P54173 and previous config saved to /var/cache/conftool/dbconfig/20231205-140720-arnaudb.json
14:06 urbanecm@deploy2002: urbanecm: Backport for Growth: Enable Welcome survey user research for ar/en/es (T351266) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:06 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
14:05 elukey@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
14:04 urbanecm@deploy2002: Started scap: Backport for Growth: Enable Welcome survey user research for ar/en/es (T351266)
14:03 moritzm: installing cups security updates
13:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P54172 and previous config saved to /var/cache/conftool/dbconfig/20231205-135213-arnaudb.json
13:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4048.ulsfo.wmnet
13:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1078.eqiad.wmnet with OS bullseye
13:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1079.eqiad.wmnet with OS bullseye
13:48 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:48 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1470.eqiad.wmnet with OS bullseye
13:44 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:43 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1465.eqiad.wmnet with OS bullseye
13:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4048.ulsfo.wmnet
13:38 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1464.eqiad.wmnet with OS bullseye
13:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P54171 and previous config saved to /var/cache/conftool/dbconfig/20231205-133706-arnaudb.json
13:30 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1470.eqiad.wmnet with reason: host reimage
13:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1078.eqiad.wmnet with reason: host reimage
13:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1076.eqiad.wmnet with OS bullseye
13:27 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:26 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1470.eqiad.wmnet with reason: host reimage
13:26 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:24 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1465.eqiad.wmnet with reason: host reimage
13:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1079.eqiad.wmnet with reason: host reimage
13:24 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1078.eqiad.wmnet with reason: host reimage
13:23 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1079.eqiad.wmnet with reason: host reimage
13:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T348183)', diff saved to https://phabricator.wikimedia.org/P54169 and previous config saved to /var/cache/conftool/dbconfig/20231205-132200-arnaudb.json
13:21 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1465.eqiad.wmnet with reason: host reimage
13:21 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1464.eqiad.wmnet with reason: host reimage
13:18 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1464.eqiad.wmnet with reason: host reimage
13:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: redis::misc::slave
13:14 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1470.eqiad.wmnet with OS bullseye
13:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T348183)', diff saved to https://phabricator.wikimedia.org/P54168 and previous config saved to /var/cache/conftool/dbconfig/20231205-131240-arnaudb.json
13:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
13:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
13:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
13:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
13:08 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1465.eqiad.wmnet with OS bullseye
13:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1076.eqiad.wmnet with reason: host reimage
13:06 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2435.codfw.wmnet with OS bullseye
13:06 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1464.eqiad.wmnet with OS bullseye
13:04 cmooney@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:04 cmooney@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update entry for sretest2003. - cmooney@cumin2002"
13:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
13:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
13:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1076.eqiad.wmnet with reason: host reimage
13:04 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
13:04 cmooney@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update entry for sretest2003. - cmooney@cumin2002"
13:03 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
13:02 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1463.eqiad.wmnet with OS bullseye
12:59 cmooney@cumin2002: START - Cookbook sre.dns.netbox
12:58 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2434.codfw.wmnet with OS bullseye
12:57 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: redis::misc::slave
12:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
12:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T348183)', diff saved to https://phabricator.wikimedia.org/P54167 and previous config saved to /var/cache/conftool/dbconfig/20231205-125641-arnaudb.json
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4042.ulsfo.wmnet
12:50 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2424.codfw.wmnet with OS bullseye
12:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
12:47 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2435.codfw.wmnet with reason: host reimage
12:47 ladsgroup@deploy2002: Finished scap: Backport for Set migration of pagelinks on large wikis of s5 to read new (T351237) (duration: 12m 30s)
12:45 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
12:45 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2423.codfw.wmnet with OS bullseye
12:45 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
12:44 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1463.eqiad.wmnet with reason: host reimage
12:42 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2435.codfw.wmnet with reason: host reimage
12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P54165 and previous config saved to /var/cache/conftool/dbconfig/20231205-124134-arnaudb.json
12:41 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1463.eqiad.wmnet with reason: host reimage
12:40 ladsgroup@deploy2002: ladsgroup: Continuing with sync
12:39 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2434.codfw.wmnet with reason: host reimage
12:37 ladsgroup@deploy2002: ladsgroup: Backport for Set migration of pagelinks on large wikis of s5 to read new (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:36 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2434.codfw.wmnet with reason: host reimage
12:34 ladsgroup@deploy2002: Started scap: Backport for Set migration of pagelinks on large wikis of s5 to read new (T351237)
12:32 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4042.ulsfo.wmnet
12:31 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2424.codfw.wmnet with reason: host reimage
12:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4051.ulsfo.wmnet
12:28 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1463.eqiad.wmnet with OS bullseye
12:28 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2424.codfw.wmnet with reason: host reimage
12:27 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
12:26 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2423.codfw.wmnet with reason: host reimage
12:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P54164 and previous config saved to /var/cache/conftool/dbconfig/20231205-122628-arnaudb.json
12:26 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
12:25 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2435.codfw.wmnet with OS bullseye
12:24 moritzm: installing unbound bugfix updates from Bookworm point release
12:23 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2423.codfw.wmnet with reason: host reimage
12:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4051.ulsfo.wmnet
12:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4039.ulsfo.wmnet
12:18 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2434.codfw.wmnet with OS bullseye
12:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T348183)', diff saved to https://phabricator.wikimedia.org/P54163 and previous config saved to /var/cache/conftool/dbconfig/20231205-121121-arnaudb.json
12:10 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2424.codfw.wmnet with OS bullseye
12:07 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:07 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:06 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2423.codfw.wmnet with OS bullseye
12:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4039.ulsfo.wmnet
12:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T348183)', diff saved to https://phabricator.wikimedia.org/P54162 and previous config saved to /var/cache/conftool/dbconfig/20231205-120206-arnaudb.json
12:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
12:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
12:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T348183)', diff saved to https://phabricator.wikimedia.org/P54161 and previous config saved to /var/cache/conftool/dbconfig/20231205-120145-arnaudb.json
12:01 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4049.ulsfo.wmnet
11:53 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
11:52 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
11:51 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
11:51 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:50 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4049.ulsfo.wmnet
11:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P54160 and previous config saved to /var/cache/conftool/dbconfig/20231205-114638-arnaudb.json
11:40 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
11:40 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
11:40 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
11:40 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
11:38 ladsgroup@deploy2002: Finished scap: Backport for Bump ParserCache TTL back to 30 days (T280604) (duration: 07m 47s)
11:33 pfischer@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
11:32 pfischer@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
11:32 ladsgroup@deploy2002: ladsgroup: Continuing with sync
11:32 ladsgroup@deploy2002: ladsgroup: Backport for Bump ParserCache TTL back to 30 days (T280604) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P54159 and previous config saved to /var/cache/conftool/dbconfig/20231205-113132-arnaudb.json
11:30 ladsgroup@deploy2002: Started scap: Backport for Bump ParserCache TTL back to 30 days (T280604)
11:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1023.eqiad.wmnet with OS bookworm
11:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
11:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
11:16 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
11:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T348183)', diff saved to https://phabricator.wikimedia.org/P54158 and previous config saved to /var/cache/conftool/dbconfig/20231205-111625-arnaudb.json
11:16 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
11:15 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
11:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
11:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
11:08 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
11:08 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
11:07 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
11:07 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T348183)', diff saved to https://phabricator.wikimedia.org/P54157 and previous config saved to /var/cache/conftool/dbconfig/20231205-110448-arnaudb.json
11:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
11:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T348183)', diff saved to https://phabricator.wikimedia.org/P54156 and previous config saved to /var/cache/conftool/dbconfig/20231205-110426-arnaudb.json
11:02 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be1002.eqiad.wmnet with OS bookworm
10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bookworm
10:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P54155 and previous config saved to /var/cache/conftool/dbconfig/20231205-104919-arnaudb.json
10:45 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P54154 and previous config saved to /var/cache/conftool/dbconfig/20231205-103413-arnaudb.json
10:21 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
10:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
10:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T348183)', diff saved to https://phabricator.wikimedia.org/P54153 and previous config saved to /var/cache/conftool/dbconfig/20231205-101906-arnaudb.json
10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T348183)', diff saved to https://phabricator.wikimedia.org/P54152 and previous config saved to /var/cache/conftool/dbconfig/20231205-100744-arnaudb.json
10:07 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
10:07 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
10:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T348183)', diff saved to https://phabricator.wikimedia.org/P54151 and previous config saved to /var/cache/conftool/dbconfig/20231205-100722-arnaudb.json
10:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15305
10:02 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
10:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15305
09:57 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 63927
09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P54150 and previous config saved to /var/cache/conftool/dbconfig/20231205-095215-arnaudb.json
09:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 63927
09:42 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
09:37 brouberol: running authdns-update on dns1004.wikimedia.org - T352639
09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P54149 and previous config saved to /var/cache/conftool/dbconfig/20231205-093709-arnaudb.json
09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T348183)', diff saved to https://phabricator.wikimedia.org/P54148 and previous config saved to /var/cache/conftool/dbconfig/20231205-092202-arnaudb.json
09:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T348183)', diff saved to https://phabricator.wikimedia.org/P54147 and previous config saved to /var/cache/conftool/dbconfig/20231205-091232-arnaudb.json
09:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
09:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 58952
09:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 58952
09:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
09:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
08:59 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
08:26 marostegui: Failover m2-master dbproxy1023.eqiad.wmnet -> dbproxy1025.eqiad.wmnet T351864
06:55 vgutierrez: rolling restart of text|secondary LVS on eqsin effectively enabling IPIP encapsulation for ncredir@eqsin - T351069
06:23 marostegui: Failover m5 from db1119 to db1176 - T352631
06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352631
06:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352631
01:18 mutante: LDAP - added user xqt to group nda (T348520)
01:12 ejegg: payments-wiki upgraded from 5284fc99 to 1d24dc90
00:06 eevans@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host restbase2028.codfw.wmnet

2023-12-04

23:53 eevans@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host restbase2028.codfw.wmnet
23:52 eevans@cumin1001: START - Cookbook sre.puppet.migrate-host for host restbase2028.codfw.wmnet
22:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T348183)', diff saved to https://phabricator.wikimedia.org/P54146 and previous config saved to /var/cache/conftool/dbconfig/20231204-225336-arnaudb.json
22:53 eileen: civicrm upgraded from 83816165 to 297a091d
22:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P54145 and previous config saved to /var/cache/conftool/dbconfig/20231204-223830-arnaudb.json
22:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P54144 and previous config saved to /var/cache/conftool/dbconfig/20231204-222323-arnaudb.json
22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T348183)', diff saved to https://phabricator.wikimedia.org/P54142 and previous config saved to /var/cache/conftool/dbconfig/20231204-220817-arnaudb.json
22:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2189 (T348183)', diff saved to https://phabricator.wikimedia.org/P54141 and previous config saved to /var/cache/conftool/dbconfig/20231204-220345-arnaudb.json
22:03 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
22:03 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
22:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T348183)', diff saved to https://phabricator.wikimedia.org/P54140 and previous config saved to /var/cache/conftool/dbconfig/20231204-220322-arnaudb.json
21:58 ebernhardson@deploy2002: Finished scap: Backport for Always load transcode state from db when opting in to primary db (duration: 08m 37s)
21:52 ebernhardson@deploy2002: ebernhardson and brion: Continuing with sync
21:51 ebernhardson@deploy2002: ebernhardson and brion: Backport for Always load transcode state from db when opting in to primary db synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:50 ebernhardson@deploy2002: Started scap: Backport for Always load transcode state from db when opting in to primary db
21:49 ebernhardson@deploy2002: Finished scap: Backport for cirrus: Enable event bus bridge on more wikis (T352335) (duration: 09m 23s)
21:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P54138 and previous config saved to /var/cache/conftool/dbconfig/20231204-214816-arnaudb.json
21:47 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main2001:~$ kafka topics --alter --topic codfw.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
21:47 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main2001:~$ kafka topics --alter --topic eqiad.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
21:42 ebernhardson@deploy2002: ebernhardson: Continuing with sync
21:41 ebernhardson@deploy2002: ebernhardson: Backport for cirrus: Enable event bus bridge on more wikis (T352335) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:39 ebernhardson@deploy2002: Started scap: Backport for cirrus: Enable event bus bridge on more wikis (T352335)
21:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P54137 and previous config saved to /var/cache/conftool/dbconfig/20231204-213309-arnaudb.json
21:27 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:27 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:19 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1077.eqiad.wmnet with OS bullseye
21:19 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
21:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T348183)', diff saved to https://phabricator.wikimedia.org/P54136 and previous config saved to /var/cache/conftool/dbconfig/20231204-211803-arnaudb.json
21:14 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
21:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T348183)', diff saved to https://phabricator.wikimedia.org/P54135 and previous config saved to /var/cache/conftool/dbconfig/20231204-211305-arnaudb.json
21:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
21:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
21:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54134 and previous config saved to /var/cache/conftool/dbconfig/20231204-211241-arnaudb.json
21:09 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main1001:~$ kafka topics --alter --topic codfw.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
21:06 ryankemper: T351503 Setting partition count to 5: `ryankemper@kafka-main1001:~$ kafka topics --alter --topic eqiad.mediawiki.cirrussearch.page_rerender.v1 --partitions 5`
20:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P54133 and previous config saved to /var/cache/conftool/dbconfig/20231204-205735-arnaudb.json
20:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1077.eqiad.wmnet with reason: host reimage
20:50 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1077.eqiad.wmnet with reason: host reimage
20:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P54132 and previous config saved to /var/cache/conftool/dbconfig/20231204-204228-arnaudb.json
20:36 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
20:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54131 and previous config saved to /var/cache/conftool/dbconfig/20231204-202722-arnaudb.json
19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1079.eqiad.wmnet with OS bullseye
19:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1076.eqiad.wmnet with OS bullseye
19:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1078.eqiad.wmnet with OS bullseye
19:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1077.eqiad.wmnet with OS bullseye
19:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54130 and previous config saved to /var/cache/conftool/dbconfig/20231204-194103-arnaudb.json
19:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
19:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
19:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T348183)', diff saved to https://phabricator.wikimedia.org/P54129 and previous config saved to /var/cache/conftool/dbconfig/20231204-194039-arnaudb.json
19:37 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:37 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P54128 and previous config saved to /var/cache/conftool/dbconfig/20231204-192532-arnaudb.json
19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
19:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
19:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P54126 and previous config saved to /var/cache/conftool/dbconfig/20231204-191026-arnaudb.json
19:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1079.eqiad.wmnet with OS bullseye
19:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1078.eqiad.wmnet with OS bullseye
19:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1077.eqiad.wmnet with OS bullseye
18:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T348183)', diff saved to https://phabricator.wikimedia.org/P54125 and previous config saved to /var/cache/conftool/dbconfig/20231204-185519-arnaudb.json
18:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1076.eqiad.wmnet with OS bullseye
18:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
18:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
18:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
18:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T348183)', diff saved to https://phabricator.wikimedia.org/P54124 and previous config saved to /var/cache/conftool/dbconfig/20231204-184630-arnaudb.json
18:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
18:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
18:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54123 and previous config saved to /var/cache/conftool/dbconfig/20231204-184607-arnaudb.json
18:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P54122 and previous config saved to /var/cache/conftool/dbconfig/20231204-183100-arnaudb.json
18:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P54121 and previous config saved to /var/cache/conftool/dbconfig/20231204-181554-arnaudb.json
18:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1077.eqiad.wmnet with OS bullseye
18:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54120 and previous config saved to /var/cache/conftool/dbconfig/20231204-180047-arnaudb.json
17:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
17:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1078.eqiad.wmnet with OS bullseye
17:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54119 and previous config saved to /var/cache/conftool/dbconfig/20231204-175448-arnaudb.json
17:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
17:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
17:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T348183)', diff saved to https://phabricator.wikimedia.org/P54118 and previous config saved to /var/cache/conftool/dbconfig/20231204-175426-arnaudb.json
17:41 ladsgroup@deploy2002: Finished scap: Backport for Category: Stop locking thousands of rows (T352628) (duration: 08m 07s)
17:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P54117 and previous config saved to /var/cache/conftool/dbconfig/20231204-173919-arnaudb.json
17:35 ladsgroup@deploy2002: ladsgroup: Continuing with sync
17:34 ladsgroup@deploy2002: ladsgroup: Backport for Category: Stop locking thousands of rows (T352628) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:33 ladsgroup@deploy2002: Started scap: Backport for Category: Stop locking thousands of rows (T352628)
17:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P54116 and previous config saved to /var/cache/conftool/dbconfig/20231204-172413-arnaudb.json
17:19 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1076']
17:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
17:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1079']
17:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
17:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1079']
17:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
17:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1079']
17:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
17:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1079']
17:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
17:14 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1079']
17:12 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1076']
17:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
17:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
17:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T348183)', diff saved to https://phabricator.wikimedia.org/P54115 and previous config saved to /var/cache/conftool/dbconfig/20231204-170906-arnaudb.json
17:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
17:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
17:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T348183)', diff saved to https://phabricator.wikimedia.org/P54114 and previous config saved to /var/cache/conftool/dbconfig/20231204-170604-arnaudb.json
17:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
17:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
17:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
17:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
17:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T348183)', diff saved to https://phabricator.wikimedia.org/P54113 and previous config saved to /var/cache/conftool/dbconfig/20231204-170525-arnaudb.json
16:52 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 45s)
16:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P54112 and previous config saved to /var/cache/conftool/dbconfig/20231204-165018-arnaudb.json
16:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 33604
16:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 33604
16:44 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 40s)
16:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P54111 and previous config saved to /var/cache/conftool/dbconfig/20231204-163511-arnaudb.json
16:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T348183)', diff saved to https://phabricator.wikimedia.org/P54110 and previous config saved to /var/cache/conftool/dbconfig/20231204-162005-arnaudb.json
16:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T348183)', diff saved to https://phabricator.wikimedia.org/P54109 and previous config saved to /var/cache/conftool/dbconfig/20231204-161408-arnaudb.json
16:14 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
16:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
16:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54108 and previous config saved to /var/cache/conftool/dbconfig/20231204-161346-arnaudb.json
15:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54107 and previous config saved to /var/cache/conftool/dbconfig/20231204-155840-arnaudb.json
15:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1076.eqiad.wmnet with OS bullseye
15:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
15:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
15:47 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:47 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
15:46 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
15:45 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P54105 and previous config saved to /var/cache/conftool/dbconfig/20231204-154333-arnaudb.json
15:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54104 and previous config saved to /var/cache/conftool/dbconfig/20231204-152826-arnaudb.json
15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1077']
15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-be1078']
15:03 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1079']
15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1077']
15:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1077']
15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1078']
15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1077']
15:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
14:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4046.ulsfo.wmnet
14:51 vgutierrez: upload tcp-mss-clamper 0.4 to apt.wm.o (bookworm)
14:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1077
14:50 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1077
14:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
14:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4046.ulsfo.wmnet
14:46 Lucas_WMDE: UTC afternoon backport+config window done
14:46 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Create new namespaces and namespace aliases for bd.wikimedia.org (T351903) (duration: 11m 48s)
14:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cp4038.ulsfo.wmnet
14:43 sukhe: running authdns-update for CR 979976 [revert of T349665]
14:40 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and mdsshakil: Continuing with sync
14:37 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cp4038.ulsfo.wmnet
14:36 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and mdsshakil: Backport for Create new namespaces and namespace aliases for bd.wikimedia.org (T351903) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:34 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Create new namespaces and namespace aliases for bd.wikimedia.org (T351903)
14:33 sukhe: running authdns-update for T352579
14:32 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Enable read new for event tables migration on testwiki (T341829) (duration: 10m 42s)
14:32 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
14:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T348183)', diff saved to https://phabricator.wikimedia.org/P54103 and previous config saved to /var/cache/conftool/dbconfig/20231204-142754-arnaudb.json
14:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
14:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
14:25 lucaswerkmeister-wmde@deploy2002: dreamyjazz and lucaswerkmeister-wmde: Continuing with sync
14:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
14:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
14:22 lucaswerkmeister-wmde@deploy2002: dreamyjazz and lucaswerkmeister-wmde: Backport for Enable read new for event tables migration on testwiki (T341829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
14:21 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Enable read new for event tables migration on testwiki (T341829)
14:21 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
14:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
14:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
14:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T348183)', diff saved to https://phabricator.wikimedia.org/P54102 and previous config saved to /var/cache/conftool/dbconfig/20231204-141848-arnaudb.json
14:15 jforrester@deploy2002: Finished scap: Backport for wikifunctionswiki: Disable thumbnail in Vector search (T352532), wikifunctionswiki: Add ability for sysops to manage Functioneer (T352495) (duration: 07m 41s)
14:10 jforrester@deploy2002: jforrester and terasail: Continuing with sync
14:09 jforrester@deploy2002: jforrester and terasail: Backport for wikifunctionswiki: Disable thumbnail in Vector search (T352532), wikifunctionswiki: Add ability for sysops to manage Functioneer (T352495) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:08 jforrester@deploy2002: Started scap: Backport for wikifunctionswiki: Disable thumbnail in Vector search (T352532), wikifunctionswiki: Add ability for sysops to manage Functioneer (T352495)
14:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P54101 and previous config saved to /var/cache/conftool/dbconfig/20231204-140341-arnaudb.json
13:59 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
13:59 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
13:58 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
13:57 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:56 moritzm: installing postgresql-13 security updates
13:52 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:52 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P54100 and previous config saved to /var/cache/conftool/dbconfig/20231204-134835-arnaudb.json
13:43 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
13:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T348183)', diff saved to https://phabricator.wikimedia.org/P54099 and previous config saved to /var/cache/conftool/dbconfig/20231204-133328-arnaudb.json
13:30 moritzm: instaling dbus security updates on buster
13:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T348183)', diff saved to https://phabricator.wikimedia.org/P54098 and previous config saved to /var/cache/conftool/dbconfig/20231204-132859-arnaudb.json
13:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
13:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
13:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T348183)', diff saved to https://phabricator.wikimedia.org/P54097 and previous config saved to /var/cache/conftool/dbconfig/20231204-132836-arnaudb.json
13:22 moritzm: installing libde265 security updates
13:22 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
13:22 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
13:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P54096 and previous config saved to /var/cache/conftool/dbconfig/20231204-131329-arnaudb.json
13:06 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
13:05 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
13:05 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
13:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
12:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P54095 and previous config saved to /var/cache/conftool/dbconfig/20231204-125823-arnaudb.json
12:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T348183)', diff saved to https://phabricator.wikimedia.org/P54094 and previous config saved to /var/cache/conftool/dbconfig/20231204-124316-arnaudb.json
12:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T348183)', diff saved to https://phabricator.wikimedia.org/P54093 and previous config saved to /var/cache/conftool/dbconfig/20231204-124037-arnaudb.json
12:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
12:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
12:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54092 and previous config saved to /var/cache/conftool/dbconfig/20231204-124015-arnaudb.json
12:35 urbanecm@deploy2002: Finished scap: Backport for User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898) (duration: 09m 43s)
12:29 urbanecm@deploy2002: urbanecm: Continuing with sync
12:28 urbanecm@deploy2002: urbanecm: Backport for User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host an-druid1005.eqiad.wmnet
12:25 urbanecm@deploy2002: Started scap: Backport for User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898)
12:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P54091 and previous config saved to /var/cache/conftool/dbconfig/20231204-122508-arnaudb.json
12:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:19 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host an-druid1005.eqiad.wmnet
12:18 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1027.eqiad.wmnet with OS bookworm
12:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P54090 and previous config saved to /var/cache/conftool/dbconfig/20231204-121002-arnaudb.json
12:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host druid1011.eqiad.wmnet
12:00 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host druid1011.eqiad.wmnet
11:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
11:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54089 and previous config saved to /var/cache/conftool/dbconfig/20231204-115455-arnaudb.json
11:54 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2422.codfw.wmnet with OS bullseye
11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
11:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T348183)', diff saved to https://phabricator.wikimedia.org/P54088 and previous config saved to /var/cache/conftool/dbconfig/20231204-115217-arnaudb.json
11:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
11:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
11:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T348183)', diff saved to https://phabricator.wikimedia.org/P54087 and previous config saved to /var/cache/conftool/dbconfig/20231204-115154-arnaudb.json
11:51 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1462.eqiad.wmnet with OS bullseye
11:43 elukey@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:43 elukey@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 44592
11:42 elukey@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
11:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 44592
11:42 elukey@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
11:40 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
11:39 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
11:39 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bookworm
11:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P54086 and previous config saved to /var/cache/conftool/dbconfig/20231204-113648-arnaudb.json
11:36 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2422.codfw.wmnet with reason: host reimage
11:33 kamila@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1462.eqiad.wmnet with reason: host reimage
11:32 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2422.codfw.wmnet with reason: host reimage
11:30 kamila@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1462.eqiad.wmnet with reason: host reimage
11:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P54085 and previous config saved to /var/cache/conftool/dbconfig/20231204-112141-arnaudb.json
11:17 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw1462.eqiad.wmnet with OS bullseye
11:15 kamila@cumin1001: START - Cookbook sre.hosts.reimage for host mw2422.codfw.wmnet with OS bullseye
11:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: eventschemas::service
11:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T348183)', diff saved to https://phabricator.wikimedia.org/P54084 and previous config saved to /var/cache/conftool/dbconfig/20231204-110635-arnaudb.json
11:02 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T348183)', diff saved to https://phabricator.wikimedia.org/P54083 and previous config saved to /var/cache/conftool/dbconfig/20231204-110156-arnaudb.json
11:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
11:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
11:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54082 and previous config saved to /var/cache/conftool/dbconfig/20231204-110134-arnaudb.json
10:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: eventschemas::service
10:51 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:51 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for the k8s-ingress-dse endpoints - btullis@cumin1001"
10:50 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for the k8s-ingress-dse endpoints - btullis@cumin1001"
10:48 btullis@cumin1001: START - Cookbook sre.dns.netbox
10:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P54081 and previous config saved to /var/cache/conftool/dbconfig/20231204-104628-arnaudb.json
10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23856
10:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 23856
10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 63927
10:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 63927
10:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31898
10:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31898
10:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58952
10:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58952
10:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 44592
10:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 44592
10:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4800
10:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4800
10:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 33604
10:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 33604
10:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 142505
10:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 142505
10:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398446
10:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398446
10:32 jayme: upgrade istio (buster -> bullseye) on wikikube codfw - T351933
10:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15305
10:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15305
10:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19165
10:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P54080 and previous config saved to /var/cache/conftool/dbconfig/20231204-103121-arnaudb.json
10:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 19165
10:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 237
10:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 237
10:28 jayme: pgrade istio (buster -> bullseye) on wikikube eqiad - T351933
10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 35 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
10:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 35 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
10:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1022.eqiad.wmnet with OS bookworm
10:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138997
10:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138997
10:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54079 and previous config saved to /var/cache/conftool/dbconfig/20231204-101615-arnaudb.json
10:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54078 and previous config saved to /var/cache/conftool/dbconfig/20231204-101143-arnaudb.json
10:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
10:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
10:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54077 and previous config saved to /var/cache/conftool/dbconfig/20231204-101120-arnaudb.json
10:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1022.eqiad.wmnet with reason: host reimage
09:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1022.eqiad.wmnet with reason: host reimage
09:58 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy GRACEFUL
09:57 godog: roll-restart prometheus/k8s to apply size-based retention - T351179
09:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P54076 and previous config saved to /var/cache/conftool/dbconfig/20231204-095614-arnaudb.json
09:49 volans@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1022.mgmt.eqiad.wmnet with reboot policy GRACEFUL
09:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P54075 and previous config saved to /var/cache/conftool/dbconfig/20231204-094107-arnaudb.json
09:36 elukey: upgrade istio (buster -> bullseye) on ml-serve-codfw - T351933
09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54074 and previous config saved to /var/cache/conftool/dbconfig/20231204-092600-arnaudb.json
09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T348183)', diff saved to https://phabricator.wikimedia.org/P54073 and previous config saved to /var/cache/conftool/dbconfig/20231204-092136-arnaudb.json
09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:21 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
09:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54072 and previous config saved to /var/cache/conftool/dbconfig/20231204-092054-arnaudb.json
09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P54070 and previous config saved to /var/cache/conftool/dbconfig/20231204-090547-arnaudb.json
08:58 elukey: upgrade istio (buster -> bullseye) on ml-serve-eqiad - T351933
08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P54069 and previous config saved to /var/cache/conftool/dbconfig/20231204-085041-arnaudb.json
08:50 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bookworm
08:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM moscovium.eqiad.wmnet
08:48 elukey: upgrade istio (buster -> bullseye) on aux-k8s-eqiad - T351933
08:45 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bookworm
08:43 elukey: upgrade istio (buster -> bullseye) on dse-k8s-eqiad - T351933
08:39 urbanecm@deploy2002: Finished scap: Backport for hewikivoyage: add tagline (T351981), azwiki: Enable $wgMinervaEnableSiteNotice (T352621), trwikivoyage: update wordmark (T352329) (duration: 09m 49s)
08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54068 and previous config saved to /var/cache/conftool/dbconfig/20231204-083534-arnaudb.json
08:33 urbanecm@deploy2002: urbanecm and anzx: Continuing with sync
08:31 urbanecm@deploy2002: urbanecm and anzx: Backport for hewikivoyage: add tagline (T351981), azwiki: Enable $wgMinervaEnableSiteNotice (T352621), trwikivoyage: update wordmark (T352329) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T348183)', diff saved to https://phabricator.wikimedia.org/P54067 and previous config saved to /var/cache/conftool/dbconfig/20231204-083102-arnaudb.json
08:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
08:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
08:29 urbanecm@deploy2002: Started scap: Backport for hewikivoyage: add tagline (T351981), azwiki: Enable $wgMinervaEnableSiteNotice (T352621), trwikivoyage: update wordmark (T352329)
08:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
08:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
08:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T348183)', diff saved to https://phabricator.wikimedia.org/P54066 and previous config saved to /var/cache/conftool/dbconfig/20231204-082758-arnaudb.json
08:25 oblivian@deploy2002: Finished scap: Backport for Add throttle rule for editathon (T352569) (duration: 18m 04s)
08:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM moscovium.eqiad.wmnet
08:23 _joe_: clearing throttle cache for T352569
08:18 oblivian@deploy2002: oblivian: Continuing with sync
08:17 oblivian@deploy2002: oblivian: Backport for Add throttle rule for editathon (T352569) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P54065 and previous config saved to /var/cache/conftool/dbconfig/20231204-081251-arnaudb.json
08:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bookworm
08:10 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bookworm
08:07 oblivian@deploy2002: Started scap: Backport for Add throttle rule for editathon (T352569)
07:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P54064 and previous config saved to /var/cache/conftool/dbconfig/20231204-075745-arnaudb.json
07:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bookworm
07:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T348183)', diff saved to https://phabricator.wikimedia.org/P54063 and previous config saved to /var/cache/conftool/dbconfig/20231204-074238-arnaudb.json
07:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T348183)', diff saved to https://phabricator.wikimedia.org/P54062 and previous config saved to /var/cache/conftool/dbconfig/20231204-073957-arnaudb.json
07:39 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
07:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bookworm
07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
07:07 kart_: Updated MinT to 2023-11-21-115852-production
07:03 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bookworm
06:57 marostegui: Failover m5 from db1176 to db1119 - T332155
06:49 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352505
06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1119,1176,1217].eqiad.wmnet with reason: m5 master switch T352505
06:44 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
06:33 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
06:28 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
06:14 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
06:11 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
06:08 kart_: Updated cxserver to 2023-12-04-055024-production (T270060, T350773, T352620)
06:06 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:05 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:03 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:02 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:59 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:58 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
04:43 ryankemper: [WDQS] Clearing `BlazegraphFreeAllocatorsDecreasingRapidly` -> `ryankemper@wdqs1007:~$ sudo systemctl restart wdqs-blazegraph`
00:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1006.eqiad.wmnet
00:09 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.eqiad.wmnet

2023-12-02

01:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1078.eqiad.wmnet with OS bullseye
01:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1079.eqiad.wmnet with OS bullseye
01:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1077.eqiad.wmnet with OS bullseye
01:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1076.eqiad.wmnet with OS bullseye
00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1078.eqiad.wmnet with OS bullseye
00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1079.eqiad.wmnet with OS bullseye
00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1077.eqiad.wmnet with OS bullseye
00:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1076.eqiad.wmnet with OS bullseye
00:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
00:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1076']
00:13 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
00:13 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
00:13 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
00:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED

2023-12-01

22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
22:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
22:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
22:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
22:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
22:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
22:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:14 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
22:13 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ms-be - jclark@cumin1001"
22:11 jclark@cumin1001: START - Cookbook sre.dns.netbox
22:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
22:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
22:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
22:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1079.mgmt.eqiad.wmnet with reboot policy FORCED
21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1078.mgmt.eqiad.wmnet with reboot policy FORCED
21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1077.mgmt.eqiad.wmnet with reboot policy FORCED
21:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1076.mgmt.eqiad.wmnet with reboot policy FORCED
21:31 cstone: payments-wiki upgraded from b37ab50e to 5284fc99
19:35 inflatador: bking@wdqs1006 rebooting unresponsive host
18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ceph2001.codfw.wmnet with OS bullseye
17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ceph2001.codfw.wmnet with OS bullseye
16:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ceph2001.codfw.wmnet with OS bullseye
16:39 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1005.eqiad.wmnet with OS bookworm
16:26 dancy@deploy2002: Installation of scap version "4.65.0" completed for 537 hosts
16:26 dancy@deploy2002: Installing scap version "4.65.0" for 537 hosts
16:25 dancy@deploy2002: install-world aborted: (duration: 00m 50s)
16:24 dancy@deploy2002: Installing scap version "4.65.0" for 569 hosts
16:24 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt1046.eqiad.wmnet
16:10 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1046.eqiad.wmnet
16:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
16:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
16:01 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:01 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to scandium - akosiaris@cumin1001"
16:00 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Give AAAA and PTR records to scandium - akosiaris@cumin1001"
15:58 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
15:58 akosiaris: give AAAA and PTR records to scandium T271142
15:57 akosiaris: give AAAA and PTR records to all rdb hosts (only 50% had it previously)
15:56 dancy@deploy2002: Installing scap version "4.65.0" for 570 hosts
15:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA records to the rest of the 50% of rdb hosts - akosiaris@cumin1001"
15:54 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA records to the rest of the 50% of rdb hosts - akosiaris@cumin1001"
15:52 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
15:51 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[1009-1010].eqiad.wmnet
15:51 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:51 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rdb[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1001"
15:50 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rdb[1009-1010].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1001"
15:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bookworm
15:45 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
15:42 urbanecm: mwmaint2002: mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=frwiki # T352550
15:38 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:38 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
15:36 akosiaris@deploy2002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 07m 24s)
15:31 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
15:31 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
15:28 moritzm: added Kamila to pwstore
15:21 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[1009-1010].eqiad.wmnet
15:19 topranks: moving esams CR interconnect to 4x10G breakout cable T347403
14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
14:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
14:27 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
14:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
14:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
14:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:26 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
14:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
14:26 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
14:26 akosiaris: cleanup rdb1009 from all deployment charts
14:26 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
14:26 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
14:26 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
14:25 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
14:25 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
14:20 hashar@deploy2002: Finished deploy [integration/docroot@88f69cc]: doc: link to the Gearman Java library (duration: 00m 05s)
14:20 hashar@deploy2002: Started deploy [integration/docroot@88f69cc]: doc: link to the Gearman Java library
14:18 hashar@deploy2002: Finished deploy [integration/docroot@1c2de6b]: doc: link to Disovery parent pom (duration: 00m 06s)
14:18 hashar@deploy2002: Started deploy [integration/docroot@1c2de6b]: doc: link to Disovery parent pom
14:09 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:08 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:05 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:05 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
14:03 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
13:48 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
13:48 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
13:32 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
13:31 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
13:30 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
13:30 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
13:28 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
13:28 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
13:27 taavi: run prometheus provision-fs on prometheus2* to create file system for cloud instance T350010
13:13 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
13:13 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
12:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
12:39 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flerovium.eqiad.wmnet
12:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flerovium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flerovium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:34 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bookworm
12:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts flerovium.eqiad.wmnet
12:17 XioNoX: add BGP custom field to Netbox - T306649
12:07 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
12:03 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
12:03 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2211 hosts
12:02 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2211 hosts
11:49 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
11:30 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cr[1-2]-codfw,cr[1-2]-codfw IPv6 with reason: resetting line card
11:30 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on cr[1-2]-codfw,cr[1-2]-codfw IPv6 with reason: resetting line card
11:29 topranks: Reset card 1/0 in cr1-codfw T350159
11:22 topranks: Disabling BGP peering to AS1299 prior to reset of card 1/0 in cr1-codfw T350159
11:09 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2 hosts
11:09 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2 hosts
11:04 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2 hosts
11:04 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2 hosts
11:00 topranks: Draining cr1-codfw transport to cr3-eqsin to reset card 1/0 T350159
10:59 topranks: Resetting circuit preference for transports landing on card 1/1 cr1-codfw T350159
10:55 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
10:49 moritzm: installing wireshark security updates on bookworm
10:37 topranks: Moving VRRP acrtive gateway for codfw row A/B vlans from cr1-codfw to cr2-codfw to reconfigure card 1/1 T350159
10:35 topranks: draining codfw<->eqiad transport link to reconfigure card 1/1 in cr1-codfw T350159
10:34 topranks: draining codfw<->eqdfw transport link to reconfigure card 1/1 in cr1-codfw T350159
10:30 akosiaris@deploy2002: Synchronized wmf-config/ProductionServices.php: (no justification provided) (duration: 07m 12s)
10:08 godog: add 60GB to prometheus/k8s in codfw
09:51 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2 hosts
09:51 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2 hosts
09:45 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jbond out of all services on: 2211 hosts
09:44 root@cumin2002: START - Cookbook sre.idm.logout Logging Jbond out of all services on: 2211 hosts
09:20 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
09:05 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:59 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:57 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
08:50 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1026.eqiad.wmnet with OS bookworm
07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
07:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
07:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS bookworm
06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2135.codfw.wmnet with OS bookworm
06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2135.codfw.wmnet with reason: host reimage
06:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2135.codfw.wmnet with reason: host reimage
05:56 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2135.codfw.wmnet with OS bookworm
05:37 marostegui: Failover m3 from db1119 to db1159 - T352360
05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1119,1159,1217].eqiad.wmnet with reason: m3 master switchover T352149
05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1119,1159,1217].eqiad.wmnet with reason: m3 master switchover T352149
02:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2109.codfw.wmnet with OS bookworm
02:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:28 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2107.codfw.wmnet with OS bookworm
02:27 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2108.codfw.wmnet with OS bookworm
02:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:24 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2106.codfw.wmnet with OS bookworm
02:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:16 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2105.codfw.wmnet with OS bookworm
02:16 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2109.codfw.wmnet with reason: host reimage
02:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2109.codfw.wmnet with reason: host reimage
02:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2108.codfw.wmnet with reason: host reimage
02:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2107.codfw.wmnet with reason: host reimage
02:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2108.codfw.wmnet with reason: host reimage
01:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2107.codfw.wmnet with reason: host reimage
01:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2106.codfw.wmnet with reason: host reimage
01:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2003']
01:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2001']
01:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2106.codfw.wmnet with reason: host reimage
01:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2105.codfw.wmnet with reason: host reimage
01:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2105.codfw.wmnet with reason: host reimage
01:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2109.codfw.wmnet with OS bookworm
01:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2108.codfw.wmnet with OS bookworm
01:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2002']
01:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2104.codfw.wmnet with OS bookworm
01:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
01:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2107.codfw.wmnet with OS bookworm
01:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ceph2002']
01:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2003']
01:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2001']
01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ceph2003']
01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ceph2002']
01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ceph2001']
01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2002']
01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2001']
01:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ceph2003']
01:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
01:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2106.codfw.wmnet with OS bookworm
01:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
01:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2103.codfw.wmnet with OS bookworm
01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2105.codfw.wmnet with OS bookworm
01:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2102.codfw.wmnet with OS bookworm
01:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2100.codfw.wmnet with OS bookworm
01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2101.codfw.wmnet with OS bookworm
01:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:28 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2003.mgmt.codfw.wmnet with reboot policy FORCED
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2002.mgmt.codfw.wmnet with reboot policy FORCED
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ceph2001.mgmt.codfw.wmnet with reboot policy FORCED
01:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2104.codfw.wmnet with reason: host reimage
01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ceph2001-3 to codfw - jhancock@cumin2002"
01:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ceph2001-3 to codfw - jhancock@cumin2002"
01:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2104.codfw.wmnet with reason: host reimage
01:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:14 foks: removing 120 files for legal compliance
01:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2103.codfw.wmnet with reason: host reimage
01:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2100.codfw.wmnet with reason: host reimage
01:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2102.codfw.wmnet with reason: host reimage
01:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2100.codfw.wmnet with reason: host reimage
01:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2101.codfw.wmnet with reason: host reimage
01:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2101.codfw.wmnet with reason: host reimage
00:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2104.codfw.wmnet with OS bookworm
00:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2103.codfw.wmnet with OS bookworm
00:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2102.codfw.wmnet with OS bookworm
00:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2101.codfw.wmnet with OS bookworm
00:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2100.codfw.wmnet with OS bookworm
00:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2098.codfw.wmnet with OS bookworm
00:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2099.codfw.wmnet with OS bookworm
00:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2097.codfw.wmnet with OS bookworm
00:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2094.codfw.wmnet with OS bookworm
00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:25 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1107.eqiad.wmnet with OS bookworm
00:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1107.eqiad.wmnet with OS bookworm
00:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2099.codfw.wmnet with reason: host reimage
00:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2099.codfw.wmnet with reason: host reimage
00:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1105.eqiad.wmnet with OS bookworm
00:09 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:08 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2098.codfw.wmnet with reason: host reimage
00:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1107.eqiad.wmnet with OS bookworm
00:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:03 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2098.codfw.wmnet with reason: host reimage
00:01 krinkle@deploy2002: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 06m 37s)
00:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2094.codfw.wmnet with reason: host reimage

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s